Representation of study scripts to improve transparency and efficiency in multidatabase distributed vaccine studies

Gini R, Messina D, Aarts W, Paoletti O, Bartolini CC, Weibel D, Choi J, Cid-Royo A, Elbers RJK, Vaz TA, Belitser SV, Bots SH, Riefolo F, Plana E, Sturkenboom MC. Representation of study scripts to improve transparency and efficiency in multidatabase distributed vaccine studies. Poster presented at the 39th ICPE Annual Conference; August 25, 2023. Halifax, Canada. [abstract] Pharmacoepidemiol Drug Saf. 2023 Oct 12; 32(S1):600-1. doi: 10.1002/pds.5687

BACKGROUND: Tools to improve transparency in reporting study designand variable definitions have been shared in the scientific community(e.g., STaRT-RWE). Tools to improve transparency on implementationin the study script are less common. Complex study protocols for mul-tidatabase distributed studies require study-tailored scripts, but timeli-ness is needed to support regulatory decisions, so efficiency is required. VAC4EU is an international association of institutions inEurope that supports robust and timely evidence generation on theeffects of vaccines.

OBJECTIVES: To illustrate a methodology supporting transparent docu-mentation of programming implementation in study scripts.

METHODS: We report on the application of the methodology in a studyon safety of COVID-19 vaccines funded by the European MedicinesAgency (ROC 20-readiness). A sequence of intermediate datasets(IDs) were designed to go from the common data model (CDM) to thefinal tables with aggregated results to be shared centrally. Each IDwas documented with a) unit of observation (UoO) b) number ofobservations for each UoO (NxUoO), classified as 1, >=1, or >=0 c)codebook, including variable names, format, vocabulary and rules forcalculation. A direct acyclic graph (DAG) was drawn representing theprogram tree: steps were represented as circles, datasets as boxes.Circles had incoming arrows from input datasets, and outcomingarrows to output dataset(s). Based on specifications, synthetic ver-sions of some IDs were generated before development started. Scien-tific programmers (SP) and statisticians (STAT) started programming inparallel from multiple points of the DAG using synthetic IDs. When allthe steps were ready, the program was released to data partners(DPs) for local execution.

RESULTS: 231 IDs were designed. Most of them (225, 97%) were selec-tions from the instance of the CDM, based on lists of codes or strings:UoO was the original record. Out of the other 25 IDs, UoO was a per-son for 13 (52%), an event for 5 (20%), and a stratum of categoricalvariables for 7 (28%). In IDs where UoO was an event or a stratum,NxUoO was 1. Among the13 IDs having a person as UoO, NxUoOwas 1 for 7 (54%), >= 0 for 4 (31%), >= 1 for 2 (15%). During thespecification phase, investigators and SP/STAT could identify caseswhen the protocol was underspecified and add clarifications, possiblywith support of DP. During execution, bugs were tracked back tosteps, and DP could access locally generated ID and support SP/STATin debugging.

CONCLUSIONS: We introduced and tested a tool to improve transparency,and allow a higher efficiency, in development and test of study scripts.

Share on: