OpenDP Roadmap

 

Throughout the year, we’ll identify high-level steps that focus on OpenDP’s core initiatives and are detailed in the Roadmap below.

The big things the development team and the community are working on right now are shown in the Implementation section. In the Planning/Design section, you can see those strategic items that we've prioritized and are designing and testing with the community. In the Future section, you'll see the things that we'd like to work on but haven't yet prioritized.

This Roadmap is focused on high-level initiatives. We're always working on smaller bug fixes and enhancements. For a view of everything that the OpenDP Project Team and Community are working on right now, check out the Project Board on GitHub .

 

In Progress

 

OpenDP Commons Contribution Process and Guidelines

  • Establish and document the intake and vetting process of community contribution of algorithms into the OpenDP library and tools and packages that use the OpenDP library.

Algorithm Development

  • The OpenDP library currently supports 6 diverse privacy mechanisms and over 20 summary statistics, which include univariate statistics (e.g. means, medians, quantiles, variances), histograms of variables, covariances, and missing data imputation. However, we are always looking to add new functionality, to broaden the scope of computations that can be performed.

Some of the algorithms planned for implementation in the near term include: 

To view all the components under consideration for future work, see this GitHub query. Your feedback on which algorithms we should prioritize is welcome!

 

Apache Arrow Support

  • The Apache Arrow (https://arrow.apache.org) data format has emerged as a very popular approach to representing tabular data in the data science and machine learning communities. It allows the efficient exchange of datasets across languages and architectures.
  • We're updating the OpenDP Library so that supplying datasets and obtaining results can be done using Arrow. This will require significant changes to our data marshaling architecture at the FFI boundary, as well as some tweaks to the way datasets are accessed internally in algorithm implementations.

Automated FFI Bindings Generation

  • The architecture of the OpenDP Library allows us to build the core in strongly-typed and memory-safe Rust, while still allowing use from more accessible languages such as Python. However, this entails manually writing FFI wrappers for all OpenDP functions that we wish to export. This is a labor intensive and highly technical process, which is a bit of a tedious task for developers, and also a barrier to new contributors.
  • We plan to greatly simplify this process, leveraging code annotations and Rust procedural macros to generate the FFI wrappers automatically (or at least with much less effort).

Interactive Measurements

  • One of the key concepts in the OpenDP Programming Framework is that of interactive measurements. These are measurements which go beyond one-shot functions, to support a series of operations, with the sequence possibly determined on-the-fly. Interactive measurements can represent more flexible approaches such as adaptive composition and the sparse vector technique.
  • We have implemented a basic form of interactive measurements, but it is at a prototype level, and needs integration with the rest of the OpenDP Library. Doing so looks to involve some interesting work in type signatures and representing internal state.

Development and Initial Release of the DP Creator web application with Dataverse integration

  • The DP Creator is a web-based application to budget workloads of statistical queries for public release.
  • Integration with Dataverse repositories will allow researchers with knowledge of their datasets to calculate DP statistics without requiring expert knowledge in programming or differential privacy.
  • GitHub: opendp/dpcreator repository | Issues
  • To learn more about the application, email us:  info@opendp.org

Epsilon Registry - Initial Survey Tool

  • Create a survey tool to collect data on real-world DP use cases, including information on data domain specific problems and legal requirements, selection of privacy-loss parameters, and use of differentially private releases.
  • Based on the work of Cynthia Dwork, Nitin Kohil, and Deirdre Mulligan’s publication Differential Privacy in Practice: Expose your Epsilons!

Prototype of Federated Machine Learning

  • Differentially private algorithms separately computed on datasets at remote locations and aggregated through postprocessing.
  • GitHub: opendp/sotto-voce
  • Contact us to learn more about this project.

 

Planning / Design

Alternate Dataset Types

  • Most of the algorithms in the OpenDP library operate on row-oriented multisets (i.e., tabular data). However, the library was designed to accommodate any type of dataset. We plan to add support for other common types, such as graph datasets.

More Language Bindings

  • Once we have more infrastructure in place for automated generation of FFI bindings, we plan to add support for more language bindings beyond Python, starting with R.

External Compute

  • OpenDP functions are currently limited to running on a single CPU. In order to support larger datasets and more computationally intensive operations, we need to extend the library to support multiple machines and external compute resources.

Benchmarking Suite

  • As part of a long-term effort for characterizing the utility of DP algorithms, we plan to begin collecting a suite of datasets for benchmarking. This will lead to further work in developing a framework for modeling uncertainty and performing inference over a population.

Integrate Federated Machine Learning with the OpenDP Library

  • Design for federation incorporated into the privacy framework, and exterior computations such as by PyTorch.
  • GitHub: opendp/sotto-voce

DP Explorer Application

  • Visual exploration tool for intuitively understanding DP releases created using the OpenDP library, this includes releases generated with DP Creator.

DP Creator in Analyst Mode

  • Design and user focused studies of workflow for allowing data scientists and archivists to create differentially private releases from a dataset by intuitively allocating their budget across a workload of desired statistics.

 

Future

Uncertainty Estimates and Utility Framework

 

Synthetic Data Generation

 

Additional Models of Privacy