Provenance and Anonymisation (PROVANON)
PROVANON, a Turing Institute pilot project is examining how the use of provenance formalisms can be used to capture the necessary features of dynamic data situations to allow more principled analysis of risk, the requirements of anonymisation and related topics such as data ethics and utility. Data provenance is a record of creation and modification of data. It has many uses, including scientific reproducibility, debugging, and establishing trust in data. The insight driving this project is that the core elements of the data environment (agents with access to the data; supplementary data which might be integrated with the data; infrastructure in which the data is stored and processed; and governance of the data) have the potential to be mapped with W3C PROV [3]. W3C PROV is an interoperability standard for provenance that defines the actors, entities, and activities which together created the dataset, and the relationships between them. Using PROV, we can map where data came from, and how they were processed. The objectives of this project are to:
Please refer project main website Task and Responsibilities:
- Undertake an analysis of modelling data environments, expressing terms as described in [1] using PROV concepts, including any domain-specific extensions to PROV that may be required to support this new usage.
- Identify how to use provenance not only as a record of the development of the data, but also to help create contractual agreement in how data will be utilised after sharing, and how it can be used to support/comply with GDPR and other data privacy legislation.
- Determine what improvements can be made to the operational power of the ADF by using the PROV representation.
Please refer project main website Task and Responsibilities:
- Reasoning over provenance for analysing data situation in the data environments.
- To Investigate models and approaches to automating and testing of compliance of data ethics, GDPR, etc.
- To investigate Retrospective, Prospective, Prescriptive, Proscriptive provenance for expressing data situation and reasoning for anonymisation decisions.