Friday, June 17, 2016

Apple’s ‘Differential Privacy’

 At the keynote address of Apple’s Worldwide Developers’ Conference in San Francisco on Monday, the company’s senior vice president of software engineering Craig Federighi gave his familiar nod to privacy, emphasizing that Apple doesn’t assemble user profiles, does end-to-end encrypt iMessage and Facetime and tries to keep as much computation as possible that involves your private information on your personal device rather than on an Apple server. But Federighi also acknowledged the growing reality that collecting user information is crucial to making good software, especially in an age of big data analysis and machine learning. The answer, he suggested rather cryptically, is “differential privacy.”

“We believe you should have great features and great privacy,” Federighi told the developer crowd. “Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.”

Differential privacy, translated from Apple-speak, is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it. With differential privacy, Apple can collect and store its users’ data in a format that lets it glean useful notions about what people do, say, like and want. But it can’t extract anything about a single, specific one of those people that might represent a privacy violation. And neither, in theory, could hackers or intelligence agencies.

[---]

Federighi’s emphasis on differential privacy likely means Apple is actually sending more of your data than ever off of your device to its servers for analysis, just as Google and Facebook and every other data-hungry tech firm does. But Federighi implies that Apple is only transmitting that data in a transformed, differentially private form. In fact, Federighi named three of those transformations: Hashing, a cryptographic function that irreversibly turns data into a unique string of random-looking characters; subsampling, or taking only a portion of the data; and noise injection, adding random data that obscures the real, sensitive personal information. (As an example of that last method, Microsoft’s Dwork points to the technique in which a survey asks if the respondent has ever, say, broken a law. But first, the survey asks them to flip a coin. If the result is tails, they should answer honestly. If the result is heads, they’re instructed to flip the coin again and then answer “yes” for heads or “no” for tails. The resulting random noise can be subtracted from the results with a bit of algebra, and every respondent is protected from punishment if they admitted to lawbreaking.)


- More Here and link to the paper The Algorithmic Foundations of Differential Privacy


No comments: