What is Differential Privacy?
Differential privacy is a rigorous mathematical definition of privacy for statistical analysis and machine learning. In the simplest setting, consider an algorithm that analyzes a dataset and releases statistics about it (such as means and variances, cross-tabulations, or the parameters of a machine learning model). Such an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset -- anything the algorithm might output on a database containing some individual's information is almost as likely to have come from a database without that individual's information. Most notably, this guarantee holds for every individual and every dataset. Therefore, regardless of how eccentric any single individual's details are, and regardless of the details of anyone else in the database, the guarantee of differential privacy still holds. This gives a formal guarantee that individual-level information about participants in the database is not leaked. Differential privacy achieves this strong guarantee by carefully injecting random noise into computation of the released statistics, so as to hide the effect of each individual.
For more background on differential privacy and its applications, we recommend the book chapter For more background on differential privacy and its applications, we recommend the book chapter by Alexandra Wood, Micah Altman, Kobbi Nissim, and Salil Vadhan, as well as the resources at https://differentialprivacy.org/resources/ and https://privacytools.seas.harvard.edu/courses-educational-materials.
What We Do
OpenDP is a community effort to build a trustworthy suite of open-source tools for enabling privacy-protective analysis of sensitive personal data, focused on a library of algorithms for generating differentially private statistical releases. The target use cases for OpenDP are to enable government, industry, and academic institutions to safely and confidently share sensitive data to support scientifically oriented research and exploration in the public interest. We aim for OpenDP to flexibly grow with the rapidly advancing science of differential privacy, and be a pathway to bring the newest algorithmic developments to a wide array of practitioners.
We began this project in a (still ongoing) partnership with Microsoft developing a differentially private data curator application called SmartNoise. Building on this collaboration, we are now establishing a broader community around OpenDP with stakeholders and contributors from across academia, industry, and government. Together, we will design, implement, and govern an “OpenDP Commons” that includes a library of differentially private algorithms and other general-purpose tools for use in end-to-end differential privacy systems.
OpenDP is being incubated by Harvard University’s Privacy Tools and Privacy Insights projects (at SEAS and IQSS), with the generous support of grants from the Sloan Foundation, a grant from the Harvard Data Science Initiative Trust in Science fund, and Cooperative Agreement No. CB20ADR0160001 from the US Census Bureau. Research that laid the foundation for OpenDP was done under NSF grant CNS-1237235, Cooperative Agreement No. CB16ADR0160001 from the US Census Bureau, NSF grant No. 1565387, an earlier grant from the Sloan Foundation, and a gift from Google.