Announcing OpenDP Library 0.11

Photo of people socializing at a reception at the Harvard Science and Engineering Complex

The OpenDP team is excited to bring you our latest release, OpenDP Library 0.11!

The OpenDP Library is a modular collection of algorithms for building privacy-preserving applications, with an extensible approach to tracking privacy, and a vetted implementation. It is available as binaries for Python on PyPI, for R on R-universe, for Rust on crates.io, or in source form on GitHub.

This release focuses on expanding the functionality and documentation of the OpenDP integration with Polars. New functionality includes utilities for estimating accuracy, privately releasing sensitive key-sets in queries that use grouping, and evenly distributing privacy parameters across queries.

Polars Functionality

We’ve updated (optional) Polars dependency to Python version 0.1.1, Rust version 0.43. This following code snip demonstrates new functionality related to accuracy estimates and releasing private key sets, on a toy survey of the species of pets owned by elementary school students.

View this gist on GitHub

You can now use query.summarize(alpha) to get a table with the mechanism and noise scale used for each statistic. When you supply a statistical significance level alpha, the table also includes the relevant accuracy estimate from which you can construct confidence intervals. A threshold is included if your grouping keys are sensitive (see next).

OpenDP can now also release queries grouped by sensitive attributes in a way that satisfies approximate differential privacy. In the above example, the OpenDP Library only releases counts for species that are common among many students. Cats and dogs are relatively common, but too few of any exotic (or illegal) pets were observed to be included in the release.

Based on the accuracy table, any given grouping key/data partition is considered “common” or “stable” if it contains more than threshold (33) records. The threshold is calibrated to limit the probability of releasing a pet species that is unique to a single student to delta.

In other more extreme settings, grouping by credit card numbers or social security numbers result in noisy partition lengths far below the threshold, resulting in no data partitions being released whatsoever with high probability.

 

“Getting Started” Documentation

We are transitioning the “Getting Started” documentation to use the OpenDP Context API and high-level OpenDP APIs that imitate popular data science libraries like Polars and SciKit-Learn. Respectively, two new top-level sections have been added: Tabular Data and Statistical Modeling.

In particular, “Tabular Data” now walks you through releasing essential statistics with the OpenDP Polars integration, as well as grouping with protected partition keys, public partition keys, and public partition lengths. Thanks to Gurman Dhaliwal for her contributions to these sections.

Randomized Response for Bit Vectors

Thank you to Abigail Gentle for contributing an implementation and proof for a randomized response mechanism that privatizes bit vectors! This primitive can be used as a building block for RAPPOR. The mechanism expects bit-vectors with limited weight: only at most a given number of bits may be set. The mechanism allows you to control the privacy loss via the probability of flipping each bit in the data.

The mechanism is available for use from Rust, Python and R, and is accompanied by a proof. An example of usage in Python is shown: https://gist.github.com/Shoeboxam/4fc4c1db538843b1adba7a534a8361a0

 

Breaking Changes

Changed the paths of some APIs:

BeforeAfter
dp.sklearn.PCAdp.sklearn.decomposition.PCA
dp.np_array2_domaindp.numpy.array2_domain
dp.Margindp.polars.Margin

OpenDP now uses a product ordering instead of a lexicographic ordering when running measurement.check under approximate-DP. This means binary_search_chain cannot be used to find scale and threshold parameters for make_laplace_threshold. Updated code examples can be found in the documentation

Getting the OpenDP Library

Further details can be found in the repository CHANGELOG. We’re excited to have you try the OpenDP Library! You can find it on PyPIR-universecrates.io, or GitHub.

We welcome your feedback and participation in the OpenDP Project. To learn more, please visit the OpenDP website or join our Slack workspace.