Pajouheshnia R, Gini R, Hyde E, Swertz MA, Sturkenboom M, Margulis AV, Franzoni C, Arana A, Ehrenstein V, Gembert K, Jansen E, Herings RMC, Thurin NH, Locatelli I, Jazbar J, Zerovnik S, Kos M, Smit S, Lind S, Metspalu A, Zaccagnino S, Busto MP, Middelkoop B, Barreiro-De Acosta M, Saez FS, Rodriguez-Bernal C, Sanfelix-Gimeno G, Poblador-Plou B, Carmona-Pirez J, Gimeno-Miguel A, Gil M, Schafer W, Haug U, Simou S, Hedenmalm K, Cochino A, Alcini P, Kurz X, Gutierrez L, Perez-Gutthann S. MINERVA: Metadata for data dIscoverability aNd study rEplicability in obseRVAtional studies. Presentation to be given at the 2022 ICPE Conference; August 28, 2022. Copenhagen, Denmark.


BACKGROUND: Identification of real-world data sources (RWDS) for valid and relevant pharmacoepidemiologic research requires comprehensive assessment of their characteristics and contents. This European Medicines Agency (EMA)–commissioned project (EUPAS39322) stemmed from the joint Heads of Medicines Agencies–EMA Big Data Task Force recommendations.

OBJECTIVES: 1. Define set of metadata and pilot metadata collection in a proof-of-concept catalogue 2. Provide recommendations on a sustainable metadata collection process and use for identifying RWDS for specific regulatory use cases.

METHODS: MINERVA, a partnership of 18 research centers in 12 European countries, worked with 15 RWDS. A list of candidate metadata was derived from public resources and structured interviews with external experts. The list of metadata was finalised after feedback from the EMA and from a variety of stakeholders during a technical workshop in which the preliminary metadata list and a proposed process for collecting and maintaining metadata in a catalogue were reviewed. A proof-of-concept catalogue was built based on the FAIR principles using the open-source software MOLGENIS. The catalogue population was piloted following two processes: i) import of metadata from a preexisting catalogue and ii) collection using an interview tool. Collection of quantitative data was piloted in four data sources and a script. Quality checks were performed by an investigator to review the population of the metadata. Results of the pilot informed a set of recommendations.

RESULTS: The metadata list included 436 variables; 241 labelled as priority for regulatory purposes were collected in the pilot. The proof-of-concept catalogue was divided into the following domains: Institutions, Data Sources, Data Banks, Common Data Models, Networks, and Studies. Considerable resources were required for entry and review of qualitative metadata to ensure metadata concepts and terminology were interpreted consistently across contributors to the catalogue entries. Quantitative metadata (age and gender distribution) were retrieved from four data sources by a script supporting four common data models. The 15 data sources included a variable number of data banks ranging from 1 to 16. Completeness of qualitative metadata varied across data sources. Recommendations were compiled in a guidance document available in the EU PAS register.

CONCLUSIONS: The MINERVA pilot showed the value of piloting major catalogue processes and a need for data curation. The challenges and limitations encountered should be taken into account in future metadata catalogues.

Share on: