Background: It is hoped that unmeasured confounding in observational comparative effectiveness research (CER) may be addressed by linking patient claims to electronic health record (EHR) data. However, EHR data are themselves subject to missingness.

Objectives: Assess the gain in information achieved by linking administrative claims to a regional health information exchange (HIE), for important clinical covariates in a population of statin initiators.

Methods: We identified adults (≥ 18 years of age) with ≥ 6 months health plan enrollment and no statin use during the6-month period prior to new statin use between July 2004 and May 2010 in the HealthCore Integrated Research Database. We then linked a subset residing in Indiana to a regional HIE and extracted demographic, clinical and laboratory parameters for key potential confounders available in the 6-month pre-index period. Proportions of patients with missing claims and EHR-based data were calculated.

Results: About 22,300 linked patients had non-null structured or free text data available at any time pre- or post-index in the EHR. Of these, 51% were male, and 73% were 41–64 years of age. Fourty-four percent initiated therapy with simvastatin. 99%, 89%, and 90% of the linked cohort were missing outpatient claims laboratory values for glucose, triglycerides and LDL/HDL, respectively. Race was systematically unavailable in claims data. Virtually all patients (98%) were missing at least one of the EHR covariates of interest. Supplementing outpatient claims with EHR data from the 6 month pre-index period reduced the missingness to 13%, 63%, 81% and 82% for race, glucose, triglycerides and LDL/ HDL, respectively. Blood pressure was missing in 99% and body mass index in 98% of the linked cohort during the 6 month pre-index period.

Conclusions: Linking claims to a regional HIE provided modest improvement in the availability of lab test results and yielded mostly complete race data. Most patients in a given study are likely to have at least one confounder missing, making it a challenge to simultaneously adjust for multiple confounders.

