Modelling data environments within PROV to assist anonymisation decision-making


The Anonymisation Decision-making Framework (ADF) operationalises the management of the risk of data exchange between organisations and environments. Despite providing clearer theoretical underpinnings for implementing functional anonymisation, the complexity of the framework means that it still in general needs to be operated by an expert. The medium term goal is to automate as much of the anonymisation decision-making process as possible. In its second edition, the ADF has increased its emphasis on modelling data flows, highlighting the potential value for formal provenance information in anonymisation decision-making. We provide a use case that showcases this functionality. Based on this use case, we identify the requirements for provenance information such that it can be utilised within the ADF framework, and identify a currently unmet requirement: the modelling of data environments. We show how data environments can be implemented using the W3C PROV standard in four different ways. We analyse each approach for costs and benefits, as well as checking them against a second use case for completeness. We summarize our findings and suggest ways forward for representing data environments within W3C PROV to underpin the automation of the ADF.

In The 2021 UNECE/Eurostat Expert Meeting on Statistical Data Confidentiality