Data Anonymization: New Era

photodune-9680632-female-doctor-looking-at-medical-records-on-ipad-sBy now professionals in the clinical trials sector are familiar with the concept of clinical trial data transparency and how both Europe and the United States are taking steps to make data accessible to the public.

Where are we now with current legislation?

In April 2014, the European Union passed legislation on medicinal products for human use. Better known as “Clinical Trials Regulation”, the legislation aims to create a favorable environment for conducting clinical trials with an increased focus on patient safety standards. Part of this legislation is strengthened transparency for clinical trial data. Implemented in June 2014 and fully enforceable by May 2016, the clinical data transparency legislation dictates that all information in the EU database submitted in the clinical trial application and during assessment procedures shall be publicly accessible unless the confidentiality of the information can be justified. The sponsor will be obligated to submit a summary of results to the database one year after trial completion in the EU.

The FDA took steps in 2007 towards data transparency when it required protocols and results to be submitted to the infamous The FDA Amendments Acts (FDAAA) required all Phase II, III and IV trials to be disclosed and results to be published “no later than 12 months after the primary completion date for an approved product, or within 30 days of receiving a marketing authorization for a new product/indication” (Clinical Leader).

What are the main challenges of data sharing?

  • Data Anonymization: is this an issue if an analysis is repeated and different results are obtained? Sponsors and researchers need to understand that additional data may be needed.
  • Informed Consent: how to track the changes over time and any data restrictions. Does anonymization remove the issue of ICF?
  • What needs to be anonymized and how?
  • How to implement anonymization and ensure quality control?
  • Which data analysis tools are appropriate?
  • How to streamline the system of data sharing to allow external researchers to submit queries and ensure data is used within the scope of the initial request?
  • How to guarantee data security and privacy?

Looking at Data Anonymization

Data Anonymization can be defined as removing identifiable and traceable links to an individual; the links from the original data to the new are completely destroyed and it is no longer possible to go back to the original dataset.

Data anonymization is crucial for clinical data transparency since any patient-level data that is shared has to be anonymized to “protect personally identifiable information” (PhRMA).

To anonymize clinical datasets, a standard macro can be developed along with data definition and anonymization mode attributes. Attributes can then be passed to the anonymization macro via a SAS definition dataset. The definition dataset lists all variables from the datasets to be anonymized. The dataset can be standardized and then maintained through a quality controlled change management system with proper versioning and approval processes.

What are the modes of data anonymization?

  • None: straight copy of the variable
  • Missing: keeping the variable but removing its contents
  • Drop: dropping the variable
  • Ageint: grouping ages 89+ (U.S. requirement according to HIPPA act)
  • Date
  • Translate

In order for data to be transparent to the public, it must also be traceable. Implementing CDISC standards helps both traceability and cross analysis of datasets. There must be clear traceability from analysis results, to analysis datasets, and to SDTM datasets. There are two types of traceability: data‐point traceability and metadata traceability. ADaM datasets allow for the creation of variable or observations that are not directly used for the statistical analysis but support traceability. For example, re‐ allocation of data may happen for early termination visits in accordance with the Statistical Analysis Plan whereby both original visit name and re‐allocated visit name are kept within the ADaM dataset. Metadata traceability includes documentation required to clearly describe information that already exists in the SDTM database together with algorithms and methods used to derive an analysis result.

CROS NT – Data Anonymization and Clinical Data Transparency

CROS NT can help companies conducting trials in Europe by preparing their clinical data in a reliable and traceable way, removing patient identifiers if necessary. We can help implement CDISC standards to give support to both traceability and cross analysis of datasets. Our Consultancy Team is up to date on the EU data transparency legislation and can help companies strategize on how to best prepare for making their clinical data publicly available.

Additionally, CROS NT can provide the technology tools necessary to help Sponsors make informed decisions and make sense of clinical data which could eventually be shared publicly. We can provide clinical data visualization tools to facilitate drill-down and click- through to multiple levels of detail, allowing for the analysis of specific subsets and subpopulations. CROS NT can also provide a centralized storage option so all trial data is indexed, traceable and transportable – making it easy to transfer data.