OpenDP Roadmap

 

Throughout the year, we’ll identify high-level steps that focus on OpenDP’s core initiatives and are detailed in the Roadmap below.

The big things the development team and the community are working on right now are shown in the Implementation section. In the Planning/Design section, you can see those strategic items that we've prioritized and are designing and testing with the community. In the Future section, you'll see the things that we'd like to work on but haven't yet prioritized.

This Roadmap is focused on high-level initiatives. We're always working on smaller bug fixes and enhancements. For a view of everything that the OpenDP Project Team and Community are working on right now, check out the This Roadmap is focused on high-level initiatives. We're always working on smaller bug fixes and enhancements. For a view of everything that the OpenDP Project Team and Community are working on right now, check out the Project Board on GitHub.

 

In Progress

 

OpenDP Commons Contribution Process and Guidelines

  • Establish and document the intake and vetting process of community contribution of algorithms into the OpenDP library and tools and packages that use the OpenDP library.

Algorithm Development

  • The OpenDP library currently supports 6 diverse privacy mechanisms and over 20 summary statistics, which include univariate statistics (e.g. means, medians, quantiles, variances), histograms of variables, covariances, and missing data imputation. However, we are always looking to add new functionality, to broaden the scope of computations that can be performed.
  • Some of the algorithms planned for implementation in the near term include: 
  • To view all the components under consideration for future work, see this GitHub query. Your feedback on which algorithms we should prioritize is welcome!

Dataframe Support

  • Scientific datasets are often stored in tabular format. Dataframe utilities can be convenient for manipulating these datasets. The Apache Arrow (https://arrow.apache.org) data format has emerged as a very popular dataframe representation in the data science and machine learning communities. It allows the efficient exchange of datasets across languages and architectures.
  • We're updating the OpenDP Library so that supplying datasets and obtaining results can be done using Arrow. This will require significant changes to our data marshaling architecture at the FFI boundary, as well as some tweaks to the way datasets are accessed internally in algorithm implementations.

R Language Bindings

  • The R programming language is a popular choice for statistical computing and data analysis. To complement the language bindings we have for Python, we are adding support for R.

Automated FFI Bindings Generation

  • The architecture of the OpenDP Library allows us to build the core in strongly-typed and memory-safe Rust, while still allowing use from more accessible languages such as Python. However, this entails manually writing FFI wrappers for all OpenDP functions that we wish to export. This is a labor intensive and highly technical process, which is a bit of a tedious task for developers, and also a barrier to new contributors.
  • We plan to greatly simplify this process, leveraging code annotations and Rust procedural macros to generate the FFI wrappers automatically (or at least with much less effort).

Privacy Odometers

  • Leveraging the work we have done on interactive measurements, we’re exploring support for privacy odometers. Odometers are similar to interactive measurements, in that they allow a sequence of queries to be made interactively. However, odometers provide additional flexibility in that they don’t require the privacy loss to be stated up front, but instead accumulate the loss on the fly.
  • Privacy odometers have potential as a general building block for other mechanisms in the OpenDP Library.

DP Creator as a Client-Side Application

  • The DP Creator is a web-based application to budget workloads of statistical queries for public release. A previous version of DP Creator integrated with the Dataverse social science repository, with the goal of being available to work with other widely-used repositories.
  • Based on feedback from user studies involving data scientists and archivists, as well as being informed by the Dataverse data strategy for private data, we are redesigning the DP Creator to keep sensitive dataset processing on the client side. In this model, finalized, differential private releases suitable for public consumption will be published on social science repositories such as Dataverse, with appropriate budget tracking.
  • GitHub: opendp/dpcreator repository | Issues
  • To learn more about the application, email us:  info@opendp.org

Epsilon Registry - Initial Survey Tool

  • Create a survey tool to collect data on real-world DP use cases, including information on data domain specific problems and legal requirements, selection of privacy-loss parameters, and use of differentially private releases.
  • Based on the work of Cynthia Dwork, Nitin Kohil, and Deirdre Mulligan’s publication Differential Privacy in Practice: Expose your Epsilons!

Planning / Design

Alternate Dataset Types

  • Most of the algorithms in the OpenDP library operate on row-oriented multisets (i.e., tabular data). However, the library was designed to accommodate any type of dataset. We plan to add support for other common types, such as graph datasets.

External Compute

  • OpenDP functions are currently limited to running on a single CPU. In order to support larger datasets and more computationally intensive operations, we need to extend the library to support multiple machines and external compute resources.

Benchmarking Suite

  • As part of a long-term effort for characterizing the utility of DP algorithms, we plan to begin collecting a suite of datasets for benchmarking. This will lead to further work in developing a framework for modeling uncertainty and performing inference over a population.

Integrate Federated Machine Learning with the OpenDP Library

  • Design for federation incorporated into the privacy framework, and exterior computations such as by PyTorch.

DP Explorer Application

  • Visual exploration tool for intuitively understanding DP releases created using the OpenDP library, this includes releases generated with DP Creator.

DP Creator in Analyst Mode, Learning Mode

  • Analyst Mode.  Allow designated users with set privacy budgets to create DP statistics without data access. This functionality has been prototyped as a web application and is being redesigned to work as a client-side application.
  • Learning Mode. Design of a repeatable workflow with visualizations to help users better understand differential privacy through “practice” on non-sensitive datasets.

Future

Uncertainty Estimates and Utility Framework

Synthetic Data Generation

  • DP Creator. Utilize the client-side implementation of the application to create and publish synthetic datasets using existing tools such as the SmartNoise SDK, https://github.com/opendp/smartnoise-sdk.

Additional Models of Privacy