The opening of research data is emerging thanks to the increasing possibilities of digital technology. The opening of clinical trial (CT) data is a part of this process, expected to have positive scientific, ethical, health, and economic impacts thus contributing to research integrity. The January 2016 proposal by the International Council of Medical Journal Editors triggered ample discussion about CT data sharing and reconfirmed the need for an ongoing assessment of its dynamics. The IMProving Access to Clinical Trials data (IMPACT) Observatory aims to play such a role, and assess the data sharing culture, policies, and practices of key players, the impact of their interventions on CTs, and contribute to a transformation of research. The objective of this paper is to present the IMPACT Observatory as well as share some of its preliminary findings.
Methods include a scoping study of research, surveys, interviews, and an environmental scan of research data repositories.
Our preliminary findings indicate that although opening of CT data has not yet been achieved, its evolution is encouraging. Initiatives by key players contribute to increasing of CT data sharing, and many barriers are shrinking or disappearing.
The major barrier is the lack of data sharing standards, from preparing data for public sharing to its curatorship, findability and access. However, experiences accumulated by sharing CT data according to “upon request” or “open” mechanisms could inform the development of such standards. The Vivli, CORBEL-ECRIN and Open Trials projects are currently working in this direction.
It has become increasingly clear that the reliable evidence needed for decision making in health as in other fields has to be based on all existing knowledge and that re-analysis of raw data of research performed is the way to get it.
The opening of research data is a relatively new field enabled by increasing possibilities of digital technology (
In order to achieve broader sharing and eventually opening of CT data, numerous barriers and obstacles must be overcome. The effort to find a new balance of CT data sharing has intensified, and there currently exist a number of data sharing initiatives led by various players. This process would benefit from the observation and assessment of its dynamics and an observatory or natural experiment is proposed as a methodology of choice to inform and contribute to its progress.
It is widely accepted that the level of reliability of evidence needed for evidence-informed decision-making increases from in vitro to interventional studies, as is illustrated by the famous evidence pyramid presented in
Evidence pyramid. The hierarchy of evidence and the role of the individual participant data (IPD) meta-analysis in knowledge creation is presented. The reliability of evidence needed for the evidence informed decision making in health increases as we move up the pyramid. It is expected that IPD meta-analysis would speed the knowledge creation.
The plethora of research studies that mostly build on each other from basic via observational to interventional studies produce findings that need to be critically verified. The critical appraisal of randomized controlled trials (RCT), which are considered by many to be the gold standard, was first introduced through systematic reviews (
Although Cochrane systematic reviews represent a relatively small portion of all systematic reviews, Cochrane contributed enormously to establishing standardized methodology and thus increased their quality. Along with the recognition of the benefit of a cumulative approach to the assessment of evidence, this enabled systematic reviews to become a key element to inform decision making. The interest in and importance of systematic reviews can be illustrated by the existence of registries of systematic reviews, such as PROSPERO (
It is important to emphasize that the systematic review paradigm has been changing as we are adding another level of critical appraisal and knowledge creation, the analysis of raw data, actually of the individual participant data (IPD). Meta-analysis of IPD may be included in a systematic review. In such IPD meta-analyses, published literature is used as one of many sources of information, often as a starting point for identifying studies in a given field. Additional studies and data are identified through other sources such as trial registries and research data repositories (repositories). Such analysis goes beyond synthesizing the reported data - it reanalyzes the IPDs, published or not. As illustrated in
The IMProving Access to Clinical Trial data (IMPACT) Observatory is monitoring various current initiatives aimed at making data available for further research, in order to assess changes in the paradigm of the CT enterprise (
If data are to be re-used and re-analysed, it is essential to know how to share them, where one can deposit them, and in which format, how to find them, and how one can access data for any type of re-use. While some key players engage in sharing and re-analyzing CT data, others play the catalyst-type role of triggering the process. The loss of data from studies, no matter how small, is of major concern, especially with regards to research integrity and research waste. The first and most critical step of data sharing is their preservation at the source. The overall objective of the IMPACT Observatory is to identify the impact on CTs of data sharing interventions and practices by key players (funders, regulators, journal editors, pharmaceutical industry, researchers, institutions, and consumers), to identify barriers and facilitators, inform the process, and to indicate trends and potential solutions. Once established, the IMPACT Observatory would function as a two-way street: a) it would collect, assess, analyze and host the information gathered and shared by the IMPACT Observatory network regarding changes in data sharing policies, practice and standards; and b) it would make the information available to those that aim to make changes to policies and practices or to develop new standards.
The objective of this paper is to present the IMPACT Observatory as a tool to assess changes in CT data sharing, as well as to present some of its preliminary findings regarding the dynamics of CT data sharing and ways data are shared in order to be reanalyzed. We also propose potential mechanisms that could enable the useful opening of CT data.
The IMPACT Observatory is an international study, hosted by the Department for Research in Biomedicine and Health of the University of Split School of Medicine. It started in October 2014, evolving from the IMPACT Initiative (
In our study we use the term “data sharing” in its broad sense, which includes the sharing and reuse of data. The term “data” is also used in a broad term, denoting the cleaned, anonymized IPDs along with all other documentation generated during the lifecycle of a clinical trial that is needed to reuse data. This includes published and unpublished documents, such as trial protocols, data management and statistical plans, informed consent and patient information sheets, regulatory and ethical documents, and clinical study reports.
We started setting up the IMPACT Observatory by building a network and choosing the methodology. A unique methodological aspect of the IMPACT Observatory is the development of a multipurpose network with a flexible interface between the network and the team, enabling people to move from one to another according their interest and level of engagement. As presented in
Contact, discuss, and engage interested people | Network members provide and/or use the information, join the team for specific tasks, and support it in other ways | |
Search, select, and analyze the literature and websites | Set a baseline at 2000; assess the clinical trial data sharing situation at baseline | |
Search, select, and analyze literature, websites, and contacts | Assess data sharing evolution over time until June 2016 and then update regularly* | |
SurveyMonkey used to survey clinician trialists, editors, consumers | Identify culture, positions and practices regarding data sharing and reuse; compare | |
Repeat in two years and expand to other players | Assess changes over time | |
Semi-structured interviews, convenient sample | Identify policies, positions and practices regarding data sharing and reuse of key players | |
Follow-up | Assess changes over time | |
Identify and analyze repositories that host clinical trial data | Analyze repository features regarding sharing and reusability of data | |
Internet; contacts; literature | Monitor initiatives and assess their impact on sharing and reuse of data | |
Communication and dissemination | Promote the IMPACT Observatory as a long-term tool. Ensure input and use of the IMPACT Observatory. Build sustainability of the IMPACT Observatory | |
Knowledge translation through publications, conferences, website | Inform so that key players can use the IMPACT Observatory in their policy making, in development of data sharing methods and standards, and to contribute to the sustainability of the Observatory | |
Various forms of promotion of the IMPACT Observatory; applications for sponsoring and funding | IMPACT Observatory is established as a long-term tool to inform the process of data sharing and its impact on clinical trials | |
*These tasks are anticipated in case the IMPACT Observatory continues beyond the initial fellowship. |
The scoping study included an Internet and literature search. For the latter, we performed a search in Medline, selected the literature that met our criteria, and extracted pre-defined information into an excel file to analyze it. Surveys were used to gather quantitative and qualitative information from key players. The questionnaire contains questions about the practice and perceptions of the participants with regards to data sharing and reuse. So far, we have performed a web-based survey using SurveyMonkey (SurveyMonkey Inc., Palo Alto, USA) of journal editors and clinician trialists, the results of which are currently being analyzed.
Semi-structured in-depth interviews were performed with a convenient sample of key players. Once the players agreed to be interviewed, a short pre-interview questionnaire was sent to them in order to gather quantitative information (e.g. “Did you perform a trial?”, “How many?”; “Did you register the trial?”; “What did you do with the data?”) and help structure the interview questions. Environmental scans of repositories that host clinical trial data are performed by identifying relevant repositories on the internet, especially through visiting registries of repositories, then extracting the pre-defined information from registry and repository websites into an excel file, and complement the information by communicating with the repositories managers.
The IMPACT Observatory officially started in October 2014 as an international study of the IMPACT Initiative. We incorporated and continued the environmental scan of repositories, which has been performed by one of the authors since 2012 (
Having defined CTs as our area of research, we started building the network and established a core team. We identified key players that influence CT data sharing; these are journal editors, publishers, clinicians, trialists/CT researchers, academia, funders, regulators, industry, consumers, the media, and repositories. Furthermore, we chose the methodology and started implementing it. During this one and a half year period, we presented the IMPACT observatory at several conferences to inform the scientific community and receive their feedback. As of summer 2016, the scoping study, analyses of two completed surveys and interviews are still ongoing as is data collection and the analysis of the environmental scan. Here we shall present some of our preliminary findings.
In our scoping study, the baseline was set to 2000 when the basic prerequisites, i.e. foundations for data sharing, were present. These included: the understanding of the need for higher transparency in clinical trials and for the sharing of raw data, the call for and establishment of initial CT registries, a defined basic methodology for systematic reviews, the launch of the Cochrane Collaboration (since 2015 called Cochrane), and the existence of IPD meta-analyses (
In the period following the year 2000, the opening of CT data experienced more rapid growth. The major trigger took place in 2004 with the historical New York City against the GlaxoSmithKline Pharmaceutical Company (GSK) trial followed by the ICMJE and the Ottawa Statements, that led to the development of International standards for trial registration by the World Health Organization (WHO) (
As shown in
Barriers preventing the public disclosure of clinical trial data. The figure presents the barriers that prevent the opening of clinical trial data identified in 2013 and a dynamics of their change. They are diminishing due to initiatives by key players. The lighter part of each barrier illustrates the tendency of shrinking or even overcome. *The Culture barrier includes a balance of opportunities vs fear; lack of appreciation of the research opportunities that data sharing provides, fear of the human and financial resources needed; lack of recognition of sharing as a good practice; lack of incentives for academics; †Data barrier includes the issues of data accuracy and quality, and the lack of standards of preparing data for sharing; ‡Repositories as a barrier: lack of domain repository and the lack of data sharing standards via repositories: upload/host/maintenance/ access.
The concept of intellectual property (IP) remains a barrier but this barrier is shrinking. Furthermore, even in this field burdened with IP concerns and protections, culture and perceptions are changing and various mechanisms of data sharing for reuse are being developed, some of which are presented in this paper. Finally, the lack of international standards of data sharing and of research data repositories are still major barriers that need to be addressed.
Various key players are taking initiatives, holding discussions, producing statements and declaring policies regarding research data sharing (
While citability of data is solved by assigning a persistent identifier, finding of available data is still a challenge. The ongoing bioCADDIE project of the BD2K aims to develop an index of all available research data, similar to what PubMed did for literature (
There are additional initiatives aiming to contribute to this issue, such as the EU-funded project CORBEL (Coordinated Research Infrastructures Building Enduring Life-Science Services,
Currently there are numerous mechanisms of data sharing. We have identified multiple formulas of data sharing that vary according to the type of access (from “upon request” to “open”), the data producer and user (trialist, systematic reviewer, academia, pharmacist), the key interested player (from researcher to regulator), the CT area (any CTs; disease specific, e.g. malignant melanoma; groups of disease, e.g. cancer, mental health; population specific).
We identified several “upon request” styles of sharing and re-using of CT data:
Researcher to researcher requests such as most Cochrane IPD meta-analyses. Direct, researcher-to-researcher sharing often includes an offer to the initial data producer to become co-author of the systematic review.
Researcher-to-regulator (EMA, FDA) request, including requests for clinical study report (CSR) which contain rich information including aggregate data but usually not the IPDs (
Direct requests to a pharmaceutical industry, which often have strings attached, and are conditioned by an agreement that usually includes confidentiality, secrecy, and non-sharing (
Requests via an intermediary that organizes the processing of the application, while data owners still control the access to data.
These “upon request” data sharing styles include sharing of raw data and/or sharing of comprehensive reports such as CSRs and other information. They are increasingly performed in an organized way as a project or initiative. They all have some form of registration, application/request, and approval process followed by a signed agreement. Three data sharing projects facilitated by intermediaries are described below. The YODA project is a partnership between Yale University and three companies (
Projects facilitating access to study sponsors’ IPDs in the upon request style. The 25 companies presented partner with Datasphere, Yoda, and ClinicalStudyDataRequest in order to share clinical trial data in the upon request style, as of June 2016. Nineteen companies partner in the Project Datasphere to share and reuse data from academic and industry phase III cancer studies. ClinicaStudyDataRequest facilitates the access to clinical trial data from 13 companies, and Yoda from 3. Several companies share their data through more than one project, while Johnson & Johnson made their data available through all three projects.
Open science needs open data. It has been increasingly understood that the first step in data sharing and reuse is the preservation of data at the source and it must be done in a systematic way. Unfortunately, this is still not the case as can be seen in the preliminary results of our survey conducted with corresponding authors of trials published in 2013. Less than 50% of those that responded had saved their trial data in the organizational database and more than 50% had kept the data on their local computers. However, we noticed a recent trend towards the creation of repositories by academia in order to preserve data generated by their researchers. Ideally, such institutional repositories would forward data to broader national or international repositories such as Figshare and Dryad (
Research data repositories (repositories) are electronic databases that host research data and facilitate their re-use. The following types of repositories are relevant for this discussion:
CT registries that host essential elements of CT protocols, some of them including summary results. They can be accessed via the WHO portal; (
registries of systematic reviews (such as Cochrane and PROSPERO);
repositories that host CT data, and
registries of repositories such as Re3data (
It can be expected that repositories will play an essential role in increasing the accessibility and reusability of research data (
The IMPACT Observatory has been specifically focusing on publicly accessible repositories of raw data and their relevant features. In our environmental scan, we identified heterogeneity in the way data are hosted and rather dubious levels of curatorship. Repositories exist at various levels, from individual institution repositories to international ones. They can also be classified according to accessibility, from closed to fully open access; according to whether the host institution is an academic or research institution or the pharmaceutical industry; and according to data types (whether they contain any research data or data from a specific field). It is important to note that there is no domain repository hosting CT data only.
The ultimate goal of opening the CT data is to enable secondary analysis which would reduce research waste, speed knowledge creation (i.e. increase the quality and efficiency of trials), increase the reliability of evidence, and thus contribute to research integrity. All of these outcomes are interconnected and regardless of where they start, all will be impacted. Although CTs have not reached the open data stage, the evolution of CT data sharing is encouraging. Existing data sharing modalities complement each other and can inform further transition which relies heavily on the collaboration with the “producer” of the original data.
The learning curve is steep and the rich experience gained by various ways of ongoing data sharing could inform a development of methods and international standards. The IMPACT Observatory aims to contribute to the process by assessing the dynamics and connecting dots.
Data sharing starts with good data management, which includes the cleaning, preservation, curatorship of data at the origin (preferably at the institutional level), anonymization and posting. There is a trend to create more repositories with constantly improving features, but there is no domain repository for clinical trials. Furthermore, we could not find any re-analysis of data across repositories and believe it would contribute to defining the methodology and data sharing standards. There is a lot of discussion about such standards and what they should include, but the usable internationally accepted standards do not exist. It is not up to repositories to develop them, but rather to the research enterprise. Based on what we have learned so far, such standards could be built on the accumulated expertise, and developed by an interdisciplinary and intersectional group. The standards development process could be coordinated by the WHO, which coordinated a development of the trial registration standards, or by an international consortium formed specifically for this purpose that would include all interested players.
Currently, data sharing standards are the most important gap preventing transition to a new level of CTs. However, we can start developing such standards as we have accumulated an impressive amount of information and expertise. Also, certain necessary elements have been defined such as citability with persistent identifiers (PIDs; DOIs and others) and the increasingly used CC citation, while others are being developed. Furthermore, numerous initiatives are contributing to this process such as the Declaration of Helsinki, IOM, the All Trials initiative, BioCADDIE/BD2K, CORBEL/ECRIN, and the “Vivli” projects. We can also build on the expertise of the Research Data Alliance, and coordination capacity and authority of the WHO.
We thank Peter Hughes for comments on earlier versions of this manuscript. We also thank Jasmine Lefebvre for editing the manuscript and Nevena Jeric and Apropo Media for graphic design and illustrations. We thank the Committee for Publication Ethics (COPE) for the initial funding of the environmental scan of repositories through a 2012
None declared.