The soon-to-be launched OpenTrials project aims to create an open, easy-to-use, linked database of information about the world’s clinical trials. It draws together information from multiple sources and presents the information in a variety of formats. The overall aim of the project is to improve access to information about trials, improve access to research, and boost transparency in the field. OpenTrials will launch a public beta on 10th October at the World Health Summit in Berlin, preceded by an OpenTrials Hack Day.

In this interview, OpenTrials community manager Ben Meghreblian explains that the new platform will “work like a search engine, with advanced search for filtering results by criteria such as drug and disease area”, and discusses the challenges, opportunities and promises of his team’s project.

Ben Meghreblian

Who do you expect to use your data, and for what ends?

Ben Meghreblian: We expect a range of users to use OpenTrials, including researchers, doctors, and patients. A researcher could find out more about a range of trials on a drug, searching by various different features such as inclusion and exclusion criteria to match a specific population. A doctor interested in critical appraisal of research papers could see if sources of bias for specific trials have already been assessed by experts. A patient interested in participating in a trial for their condition could identify trials in their geographical area which are enrolling. We’re also interested to see how policy makers and regulators may use OpenTrials to inform their work, and how data journalists and developers will use the data to write interesting stories.

How easy will the platform be to use?

OpenTrials works like a search engine, with advanced search for filtering results by criteria such as drug and disease area. From our user testing so far we haven’t come across any major usability issues, but we’re taking all feedback into account – we want to make it easy to use for a wide range of people. No special software or IT skills will be needed.

Looking at the data, what discoveries have surprised you most so far?
So far we’ve found two interesting things. Firstly, we found a number of publications on PubMed which have an incorrect trial registry ID associated with them. We have used PubMed Commons to comment on these trials, and have already received positive responses from some authors. Secondly, we have found interesting discrepancies and problems with the data on registries, and elsewhere. Some of the errors we’ve found are widespread, and concerning. Keep an eye on the OpenTrials blog for details.

What have been the biggest challenges in developing OpenTrials?

Getting the data that we need and cleaning it. We currently extract data from several different sources, each with its own structure. For example, one source might use the location name “United States of America”, another might use “USA”, and so on. We have to keep the names consistent so the user can easily find trials. This problem gets more complicated when we consider things like drug names, condition names, company names, etc.

Is there an automated function to detect inconsistencies and poor research practices such as primary outcome switching?
For a given trial, we automatically list discrepancies across different registries – for example trial status and number of trial participants. For poor research practices such as primary outcome switching, this is currently too difficult to do automatically, but the COMPare Trials project is doing a great job of manually assessing outcome switching in trials published in academic journals. RobotReviewer are doing interesting work on assessing these flaws using software alone.

You plan to score the methodological rigour of trials. How will that work?

We have been given risk of bias data from the Cochrane Schizophrenia group, which grades trials on issues such as blinding and selective reporting. For those trials, we will display this information on the corresponding OpenTrials page along with other trial information. We hope that showing this data integrated will encourage other groups and companies to share their datasets with OpenTrials.

How will your data set differ from similar ones such as the Good Pharma Scorecard or that generated by the 2015 STAT investigation?

The Good Pharma Scorecard is an excellent window onto a small number of trials where results have been manually searched for. Charles Piller’s 2015 STAT investigation was a valuable static snapshot of registry data showing which institutions are best and worst for overdue trials. OpenTrials advanced search will allow users to conduct similar searches, across all clinical trials conducted. As the population of the database becomes more complete, it will facilitate similar audits, but where the results update live as trials are published (or not).

What data will be included?

We currently extract and display data from, EU CTR, HRA, WHO ICTRP, and PubMed, and include risk of bias assessments from the Cochrane Schizophrenia group. After the launch, we plan to integrate systematic review data from Epistemonikos and other sources. There are seven additional sources of data that we’ve extracted, but can’t display because of licensing issues – we’re working with them to get permission to publish. We’ll let users know when they become available via the OpenTrials blog.

You plan to populate the database manually, but only for a small number of trials. What value will that add?

It will showcase what a perfect database would look like, and the value it can give to patients, researchers, and doctors. Additionally it will allow us to establish the amount of manual effort necessary to manually perfect the entire database.

You plan to allow third parties to submit data online. Is there a danger of players with vested financial interests submitting partial or tailored data to influence perceptions of drugs’ effectiveness?

Any submitted data will be manually approved by a researcher in the first instance. While it will be impossible to stop those with vested interests submitting altered data (something we don’t think will happen often), we hope the community of OpenTrials users will help flag any anomalies in the contributions we will host, which can then be reviewed by our team. As more information is contributed and data sets donated, we would expect outliers to become easier to spot by having many eyes on the data and triangulating information.

Can you explain what your transparency scoreboard will do?

We aren’t releasing the transparency scoreboard as part of our beta launch. We will release this at a later date, but meanwhile all our data will be open and accessible, along with an interface: we encourage others to run their own analyses, build applications, and find interesting patterns and stories in the data.

Does it make sense for several groups to build separate platforms? Wouldn’t it be better to agree on a universal data sharing standard first and then develop one definitive global platform?

Getting widespread agreement on standards may be ideal, but is notoriously hard to achieve. It is very unlikely that they would be imposed on all of the hundreds of thousands of trials already conducted. We can start linking, matching, and building, right now, and there is no sense in further delay. We would rather take the initiative and build a functional platform supported by the community. We’ve spoken to a number of groups in related fields and we are always keen to discuss shared interests and ways of collaborating in order to not reinvent the wheel!

Which is your favourite clinical trials registry, and why? is a leading registry both technically and in terms of the number of trials it contains. Their data is well-structured and well accessible. Currently there are quite a few registries with poor support for data collection, so this stands out amongst them.

What can existing clinical trials registries do to make their data more accessible and useful for researchers?

Registries could provide API access to their data, along with making sure that the trial metadata is well-structured and follows recommended standards (e.g. the WHO Trial Registration Data Set). Ideally, the metadata would follow known standards, for example using ISO codes for countries instead of their names and using a database such as MeSH to define terms like conditions names. This would make the different registries’ databases comparable. While API access is very useful, the best way a registry can offer its entire database is as a regular download, similar to what the FDA does with its OpenFDA website. This makes it much simpler for researchers who need a local copy of the database.

What can other players do to make their data more accessible and useful for researchers?

Beyond adopting standards and guidelines, we’d encourage a range of players to embrace being open by default. We’re keen to talk to any organisations or companies who want to make their data more accessible via OpenTrials.

Beyond OpenTrials, ten years from now, what information do you expect researchers to be able to access? And what information do you think will remain elusive?

There is a clear need for a step change. We need structured data about all trials, describing the methods and results, instead of only free text reports. That’s the big horizon. Basic knowledge management has been sorely lacking on clinical trials, and that makes no sense. We spend millions on individual trials. We need to spend a little on making sure the information is discoverable, machine readable, and impactful.

This interview was conducted via email by AllTrials campaign manager Dr Till Bruckner.