Unlocking Insights from Real-World Data to Help Pediatric COVID-19 Patients

The Biomedical Advanced Research and Development Authority (BARDA), in collaboration with other U.S. Department of Health and Human Services (HHS) partners, announces $200,000 in total award for two winners of the Pediatric COVID-19 Data Challenge.

April 06, 2022

Unlocking Insights from Real-World Data to Help Pediatric COVID-19 Patients Photograph

Imagine giving clinicians tools that could predict which children will become so sick with COVID-19 that they need to be hospitalized or require medical interventions. The Pediatric COVID-19 Data Challenge was designed to help equip healthcare providers with the information and tools they need to identify pediatric patients at risk, implement earlier interventions, and improve patient outcomes. The goal was to unlock critical insights hidden in the healthcare data ecosystem.

The healthcare ecosystem continuously generates detailed data in electronic health records (EHRs) through the various patient touchpoints with medical providers. This real-world data represents incalculable potential to help providers triage care in overcrowded emergency departments (EDs) and clinics and provide real-time trends across the nation.

“Data captured within EHR systems offer robust and comprehensive views of a patient’s medical journey,” noted Sandeep Patel, PhD, Director of BARDA’s Division of Research Innovation and Ventures (DRIVe). “Leveraging this commonly available healthcare data to better identify risk of disease severity, especially in the context of a pandemic like COVID-19, is one of the many ways organizations can drive innovation in care delivery.”

Dr. Patel also noted that the challenge’s focus on near real-time analysis of data could not have been timelier for its target patient population. “We launched the Pediatric COVID-19 Data Challenge during a critical time in the pandemic, when pediatric patients largely didn’t have access to vaccinations and were highly vulnerable as a result,” said Dr. Patel. “This challenge provided an opportunity to develop and test algorithms from commonly available data that could empower clinicians with better insights to predict severe outcomes and hospitalizations more accurately so they can make critical decisions to reduce hospital burden and improve pediatric patient outcomes. We are excited about the potential for the results of this challenge to create new capabilities that can be available in the future.”

Forging Multidisciplinary Partnerships to Fuel the Creation of Innovative Computational Models

device and pen

The Pediatric COVID-19 Data Challenge is sponsored by BARDA, in partnership with two institutes from the National Institutes of Health – the National Institute of Health’s National Center for Advancing Translational Sciences (NCATS) and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) – along with the Health Resources and Services Administration’s (HRSA) Maternal and Child Health Bureau. Administration and quantitative analysis of the challenge were managed by Sage Bionetworks.

The challenge asked participants to develop, train, and validate computational models to predict and identify pediatric patients at risk for hospitalization, ventilation, and cardiovascular interventions, utilizing the de-identified electronic health record data available through NCATS National COVID Cohort Collaborative (N3C) Data Enclave, which is the largest COVID dataset in the U.S. This de-identified NCATS data set combined diverse data types, such as demographic, diagnoses, medication, procedure, laboratory results, vital signs and county-level social determinants of health data.

“The open collaboration and enthusiasm from the different teams have been incredible. From teams joining forces to solve problems to other teams independently validating the data used in the challenge, the collaborative spirit during this challenge has been on full display. I was excited to see the wide range of participants: private businesses, university research groups, and citizen scientists all actively participated and brought insightful models to the challenge.”

- Timothy Bergquist, PhD, Sage Bionetworks, Contract Challenge Administrator for the Pediatric COVID-19 Data Challenge

Participants were asked to address two tasks using the data provided in the N3C data enclave to develop their computational models. Over 200 participants joined, 88 teams were formed, and 55 models were submitted for both tasks. Participants consisted of both small and large teams, including academic institutions, large and small businesses, as well as citizen scientists. Submitted models were scored for model performance and generalizability, feature interpretation, method clarity, timeliness of predictions, clinical utility, and reproducibility among other evaluation metrics. The evaluation of the computational models was also an unprecedented collaboration across government. Program officials, subject matter experts, clinicians, and data scientists from four agencies interrogated the most promising models to identify the most quantitatively and qualitatively useful mode to assess pediatric COVID-19 Severity. The highest scoring model was selected to be a winner from each task.

A Model for Predicting Need for Hospitalization

evaluation of child

In task 1, teams developed computational models to predict the need for hospitalization among pediatric patients who test positive for COVID-19 in an outpatient setting.

The winning team of Task 1 was the Department of Biostatistics & Medical Informatics (BMI) at the University of Wisconsin-Madison. UW Madison will be awarded $100,000 for their high-performing gradient boosting method and handcrafted features extracted from multisite EHR data. The team tailored a widely used machine learning approach (gradient boosting), reduced the dimensionality of EHR data, and enhanced model interpretability by summarizing patients’ medical conditions and drug exposures using medical meaning concepts such as International Classification of Diseases (ICD-10) and Anatomical Therapeutic Chemical (ATC) codes. Not only did they perform the best of the scored models quantitatively, but they also used a subset of COVID-19 related lab measurements and recent values (prior to the patient’s COVID-19 diagnosis) and customized the model training/tuning procedure, so that the model was resistant to sample size bias, making it more generalizable across multiple sites. Post-challenge, the team is interested in refining their model to tie into therapeutic interventions for high-risk groups and incorporate additional information such as clinical notes into the model.

“We really appreciate all the efforts frontline workers do to protect us from COVID-19. As biostatisticians and data scientists, we also want to make a little contribution to the fight against COVID-19. We hope models like ours can be further refined and implemented in practice to improve health care delivery.”

- Guanhua Chen, PhD, University of Wisconsin – Madison-BMI team, Task 1 Winner of Pediatric COVID-19 Data Challenge

A Model for Predicting Need for Respiratory and Cardiovascular Interventions

health worker in mask at computer

In task 2, teams developed computational models to predict the need for respiratory and cardiovascular interventions in hospitalized pediatric patients, including children with multisystem inflammatory syndrome in children (MIS-C), a life-threatening inflammation of organs and tissues.

Vir Biotechnology, Inc., the winning team of Task 2, was awarded $100,000 for their high-performing gradient boosted tree classifier, capable of extracting patterns from the complex set of EHRs. The team focused on extracting data from laboratory measurements, disease conditions and past medical interventions to employ manual data cleaning, creation of new aggregate variables, and further harmonization of the data model. Not only did this group have the highest quantitative score, they also employed a missingness aware classifier, capable of learning from the patterns of data availability and which avoids the imputation of missing data and overfitting by evaluating their trained classifier. When their model was evaluated to simulate a live clinical scenario, their model maintained its high performance. The team hopes to further evaluate their model in clinics and create standards and privacy-preserving analytics to foster a new generation of decision support tools. They envision similar models in the future with the ability to accurately forecast the burden of disease for patients and hospital systems to become critical components of pandemic preparedness and real-time response.

“The massive amounts of data that are generated during the pandemic open many paths for innovation. By creating new predictive models, health systems can identify drivers of severe disease, and implement predictive tools to optimize the delivery of clinical services.”

- Amalio Telenti, MD, PhD, Vir Biotechnology team, Task 2 Winner of Pediatric COVID-19 Data Challenge

Honorable Mentions Emphasize that These Challenges Impact Everyone, but Solutions Can Come from Anyone

A team from the Oregon Health & Science University received an Honorable Mention for Feature Interpretability & Design. The team used a common set of predictors including demographics, laboratory values and associated diagnosis codes to employ an ensemble classifier that combined individual predictions from logistic regression, random forest, gradient boosted tree, and artificial neural network models. They used Shapley Additive Values to provide individual-level and population-level explanations for model predictions. This high-performing approach provides clinicians with an outcome prediction and an individualized explanation with predictors for intervention. The team began to explore how the model could be applied to patient populations to help clinicians prioritize allocation of monoclonal antibodies and would like to further optimize their model to address different sub-populations that may have underlying biases (e.g., racial or socioeconomic disparities), as well as validate their model further to provide early intervention to high-risk children to prevent severe outcomes.

“As a pediatrician and infectious disease expert, I care for children with COVID-19 in my clinical practice. I was excited to participate in this project and to develop models that could help the populations I serve as a physician.”

- Lorne Walker, MD PhD, Oregon Health and Science University team, Honorable Mention for Feature Interpretability and Design, Pediatric COVID-19 Data Challenge

A retired physicist and electrical engineer at Wind City Applied Research in rural New Hampshire, B. L. Cragin, PhD, received an Honorable Mention for Clinical Utility. Cragin noticed that model features derived from existing electronic health record codesets as defined in the National COVID Cohort Consortium Data Enclave gave consistently better performance than those based on machine-selected codes, allowing him to also benefit from the extensive clinical expertise of that community. In developing his model, he applied an open-source "extreme boosting" algorithm called XGBoost that has proven to be a top performer in earlier predictive modeling challenges. The XGBoost code also facilitated the introduction of a modern Shapley Value analysis technique that generalizes the concept of a vector of the feature importance of a model to a feature importance matrix, each row of which applies to an individual case or patient, thus allowing clinicians to identify specific population sub-cohorts for which any given feature is expected to be an especially good indicator of increased risk. Cragin hopes to establish an informal association with an existing academic or industry research team to join their effort to make a marketable product.

“For the past 10 years, I have spent time in my retirement furthering data models for the public good. By leveraging existing feature sets from the National COVID Cohort Consortium Data Enclave’s existing codesets, I was able to use clinical information created by others to develop a model that is both quantitatively and qualitatively useful for clinicians.”

- Bruce Cragin, Honorable Mention for Clinical Utility, Pediatric COVID-19 Data Challenge

A team from ARI Science received an Honorable Mention for Computational Methodology. The team took into account clinical and laboratory indicators from pre-visit and during-visit data that was normalized by age, gender and other demographic attributes and fed into Random Forest, Neural Network, Regression-based, Naïve Bayes and Neighborhood-based artificial intelligence (AI) models to create ensembles of predictions. The team hopes that the sub-model of their ensemble of ensembles can identify the highest risk children even prior to COVID-19 infection.

“The way we designed our AI-based disease severity prediction algorithm can be applied to specific age cohorts, to unvaccinated populations pre-COVID exposure, and to other diseases due to the flexible AI architecture we created. One can think of our ensemble of ensemble approach as plug-and-play for structured clinical data.”

- Joy Alamgir, ARIScience, Honorable Mention for Computational Methodology, Pediatric COVID-19 Data Challenge

A Brighter, Collective Future for Pediatric COVID-19 Patients

young girl in mask swinging

Although geared toward driving innovation in care delivery for pediatric COVID-19 patients specifically, the computational models submitted by design teams have the potential to be further developed and validated for use in ED settings, and could be applicable against future public health threats. Together with our partners, DRIVe continues to build out an ecosystem of restless innovation, driven by industry and the entrepreneurial community, to address the nation's greatest health security threats. “We have a great deal to learn about how COVID-19 infection affects children,” said Alison Cernich, PhD, deputy director of the NICHD. “Our hope is that the winning computational models will allow us to prepare for the most severely ill cases so that we can refine the interventions needed to help them.”

More information about each of the winners and their concepts may be found on the Pediatric COVID-19 Data Challenge page.


The Division of Research, Innovation and Ventures (DRIVe) team is developing new approaches to catalyze innovation in the way we prevent, detect, and respond to health security threats, including through the use of prize challenge competitions. More information is available at https://drive.hhs.gov/pediatric_challenge.html.

Last Updated: April 06, 2022


Learn more about BARDA's work in science and the impact of our countermeasures.