Βy Prof Kostas Marias [Foundation for Research and Technology – Hellas (FORTH)], Ioannis Karatzanis [Foundation for Research and Technology – Hellas (FORTH)] and Katerina Nikiforaki [Foundation for Research and Technology – Hellas (FORTH)]
Anonymization is an irreversible processing operation that consists of using a set of techniques in such a way as to make it impossible, in practice, to identify the person by any means. This process is challenging when dealing with DICOM formatted data. The complexity lies in anonymizing while preserving the value of the DICOM dataset. As a result, the process of defining the optimal data anonymization strategy, especially for data sharing purposes, has proven to be a major technical challenge.
From the beginning of the ProCAncer-I project, it became obvious that different clinical sites had a variable degree of experience and deployed a number of different technological solutions to anonymize data. In the data sharing context, if there is no common strategy regarding anonymisation, it can create discrepancies and heterogeneity in the data collected. Having in mind that one of the goals of the project is the creation of the ProstateNET, a publicly accessible mpMRI data repository, it was of utmost importance to deal with this from the beginning.
For these reasons, a common anonymization strategy had to be defined and adopted, taking into consideration all different data types to be collected. A series of discussions took place between legal, clinical and technical partners in search of common ground. At the same time, an exhaustive analysis of relevant available technological solutions took place in order to explore the functionalities of each tool and evaluate its performance on the specific type of data the project is dealing with. At the end of the process, the partners agreed to adopt a ‘whitelisting’ anonymisation strategy. The whitelisting approach was considered appropriate for both retrospective as well as prospective data. The major decisions reached regarding the anonymization process are described below:
- Each Clinical institution will use the tools that are currently used for data anonymization within the hospital site. Such tools are usually classified as ‘blacklisting tools’, meaning that a dedicated software removes or modifies selected tags from the series’ list of DICOM tags. These tags are usually tags with personal information or tags with information that can lead to the identification of an individual. All other DICOM tags are maintained when using the black-listing anonymization approach. It is obvious that this process may generate heterogeneities.
- In order to produce a homogenous dataset in terms of the metadata used for the DICOM headers, ProCAncer-I will apply a second anonymisation layer within each hospital. During this phase, only the absolutely necessary DICOM header information required for deciphering quantitative information and AI modelling will be preserved discarding all the other DICOM tags. This approach, referred to as ‘whitelisting’ will retain only those non-personal health information required for AI modelling and at the same time, it will homogenise the anonymized data before final upload. The ‘whitelist’ anonymization process will be performed through the established RSNA DICOM Anonymizer tool that was customized using a custom configuration script. This configuration script implements the DICOM whitelist that the project defined following an iterative process that has been coordinated by Ioannis Karatzanis and Katerina Nikiforaki from FORTH.
- Once the DICOM data goes through the white-list anonymization, it will be uploaded to the ProCAncer-I repository as ‘anonymised data’.
A very important decision taken is the fact that all data will be anonymized within the security domains of clinical institutions before data uploading to the ProCAncer-I repository. In parallel, the experienced team of Prof. N. Forgo (UNIVERSITAT WIEN) who provide legal guidance to the project, together with legal teams and DPOs of all relevant partners will continuously monitor the risk of data re-identification throughout the project, in order to eliminate any such re-identification risk. Concurrently, the technical partners of the project will provide continuous support to the clinical institutions for the efficient implementation of this very important task in line with the common strategy agreed upon. They will also work in close collaboration with clinical partners to ensure that this process is optimally implemented in the context of a busy clinical environment. Overall, the aim for an easy to implement workflow has been successfully established, performing robust anonymization in a homogeneous manner across all partners. At the same time, a common approach in anonymization and homogenization of the process is also being discussed across all the ‘AI for Health Imaging’ funded projects.