Announcing OpenDP Library 0.10

Photo of people socializing at a reception at the Harvard Science and Engineering Complex

The OpenDP team is excited to bring you our latest release, OpenDP Library 0.10!

The OpenDP Library is a modular collection of algorithms for building privacy-preserving applications, with an extensible approach to tracking privacy, and a vetted implementation. It is available as binaries for Python on PyPI, for R on R-universe, for Rust on crates.io, or in source form on GitHub

This release focuses on improving the library user experience and finding what they need quickly in the OpenDP library. Features include a new integration with the Polars DataFrame library, reorganization of library documentation and usability improvements, and stronger linting and static analysis of our Python source code. Overall, we hear you and appreciate your feedback and suggestions!

 

Polars

We’ve heard from users that a significant challenge of using OpenDP is needing to learn new programming idioms, on top of whatever other tools they use for data analysis. With this release we work towards addressing this by integrating Polars and OpenDP. Polars is a data frame library implemented in Rust, with Python and R bindings. You can now use the Polars API to query your data with differentially private guarantees from Python or Rust. Here’s an example:

>>> import polars as pl

>>> import opendp.prelude as dp

>>> dp.enable_features(“contrib”)

>>> # set up your analysis

>>> context = dp.Context.compositor(

…     data=pl.read_csv(“grade_pets.csv”),

…     privacy_unit=dp.unit_of(contributions=1),

…     privacy_loss=dp.loss_of(epsilon=1.),

…     split_evenly_over=3,

…     margins={(“grade”,): dp.Margin(

…         public_info=”keys”,

…         max_partition_length=100)

…     })

>>> # define a query to count the number of pets per grade

>>> query = context.query() \

…     .group_by(“grade”) \

…     .agg(pl.col(“pets”).fill_null(0).dp.sum(bounds=(0, 10)))

The last two lines in this code snip directly use the Polars LazyFrame API to construct the query.

Extensions that enable differential privacy (like the bounded sum with noise perturbation above) are accessible from the .dp namespace.

The following code snip releases the query, consuming the privacy budget, and then collect executes the calculations:

>>> release = query.release().collect()

To ensure privacy, collect may only be called once. Release and collect are separated to allow finer control over how the query is executed, and where the results end up.

This is a work in progress: We would appreciate your feedback on the usability of this interface. We also anticipate expanding the set of Polars APIs that OpenDP supports. The current list of supported APIs is here.

 

Documentation

With this release we are publishing a reorganization of our existing documentation. We’ve identified four distinct user groups, and have set up tracks through the documentation that try to address the needs of each community:

  • New OpenDP users can start with Getting Started.
  • For experienced users, the API User Guide provides a top-down view, while the API references for PythonR, and Rust provide a bottom-up view, with details about every function in the library.
  • People more interested in the theory of differential privacy can begin with our Theory section.
  • Finally, developers on the project will find Contributing useful

We’re also trying to make the existing documentation more useful. We’ve begun providing tabbed examples for both Python and R in the “Getting Started” section, and the API references now include usage examples. We’ve begun with the measurement functions (Python APIR API), and plan to add examples to other modules as well.

We’ve improved cross references within the documentation: For example, in the Python API reference each module now links to the corresponding section in the API User Guide to explain its role in the OpenDP Framework. If you can’t find what you’re looking for, the site search now includes snippets from matching pages, and where URLs have changed, the 404 page will now point you to the new location.

 

Usability

We hope the Polars integration will go a long way towards improving the usability of OpenDP, but we’ve made other fine-grained improvements as well. For example:

  • There is now a warning if an excessive privacy loss parameter is specified.
  • If a vector domain would work but the user provided a scalar, the error message makes a suggestion.
  • In R, if the arg parameter is missing, there is an error instead of just returning NULL.

We’ve also begun collecting usability issues in our test suite, so even if it turns out to be a user error, if the behavior of OpenDP surprises you, we’d like to know about it, so we can work towards minimizing surprises for future users.

 

Breaking Changes

Some constructors have been replaced in favor of equivalent constructors with shorter names:

BeforeAfter
make_base_laplacemake_laplace
make_base_discrete_laplacemake_laplace
make_base_gaussianmake_gaussian
make_base_discrete_gaussianmake_gaussian
make_base_geometricmake_geometric
make_base_laplace_thresholdmake_laplace_threshold

Python Static Analysis

In the Python codebase, we’ve added more type annotations: this will both help us avoid mistakes, and help users provide the right kinds of arguments to OpenDP. In both Python and R we are using linting more extensively to help us spot problems in PRs. With this release we also drop support for Python 3.8: The backward compatibility added complexity to the code and prevented us from using newer features of the language.

 

Getting the OpenDP Library

Further details can be found in the repository CHANGELOG. We’re excited to have you try the OpenDP Library! You can find it on PyPIR-universecrates.io, or GitHub.

We welcome your feedback and participation in the OpenDP Project. To learn more, please visit the OpenDP website or join our Slack workspace.