• Thread Author
Artificial intelligence has taken center stage in transforming the future of healthcare. With breakthroughs spanning electronic health record analysis and the ability to detect cancer from medical images, AI promises faster, more accurate, and often less invasive diagnostics. Yet, a critical and often overlooked segment of the patient population is being left on the sidelines: children. Recent research is sounding alarms about a pronounced and growing “age bias” in biomedical AI, raising profound questions about equity, safety, and the future direction of AI-driven medicine.

Age Bias in Medical AI: A Systemic Blind Spot​

While medical AI continues to mature, the issue of age bias—specifically the underrepresentation of children in biomedical datasets—has become starkly apparent. A 2023 review painted a troubling picture: out of 692 medical AI devices, only 22 were transparently evaluated on children and approved by the FDA for pediatric use. This means that the vast majority of AI-driven tools that are making their way into clinics, radiology suites, and hospitals are designed, tested, and approved with little to no consideration for pediatric patients.
The problem, however, runs deeper than just regulatory oversight. According to a study published by Microsoft researchers in 2024, the root cause is the alarming scarcity of pediatric data in the public medical imaging datasets that fuel AI innovation. Their exhaustive review of 181 public datasets found that “less than 1% of public medical imaging data is from children, despite children making up 25% of the global population.” This startling disparity underscores a growing age bias that is baked into the very foundations of biomedical AI.

The Dearth of Pediatric Data​

Modern AI depends fundamentally on the availability of large, diverse, and well-annotated datasets. In the context of medical imaging, this means datasets that reflect the full spectrum of human biology—across age, gender, ethnicity, and health status. Yet, in practice, children rarely appear in publicly available datasets. Of the 181 datasets reviewed by the Microsoft team, only 116 reported patient ages at all. Even among those, the pediatric representation was vanishingly small and staggeringly unbalanced.
One revealing statistic from the research: out of nearly 19,000 MRI images available for cancer diagnosis model development, only five originated from children. This is not an isolated incident—children are underrepresented in imaging data for virtually every task and modality surveyed. The analysis shows that, globally, pediatric health data is either not collected, not retained, or is suppressed due to privacy concerns or perceptions of risk. The result is a biomedical landscape where AI models reflect and serve only a segment of the population, systematically excluding children.

Consequences of Age Bias in Medical AI​

The implications of this oversight are far-reaching. Chief among them is the stagnation it creates in pediatric research and clinical AI tool development. At the prestigious Medical Imaging with Deep Learning (MIDL) conference covering 2023 and 2024, only one out of forty-six machine learning studies utilized pediatric data. This imbalance is evident not only in the datasets themselves, but also in the lack of peer-reviewed research focused on children’s health.
But exclusion from research is only the first domino to fall. The scarcity of pediatric data means that, in the absence of dedicated models, clinicians might resort to “off-label” use—applying AI tools trained on adult data to diagnose or manage children’s illnesses. Off-label use is not unusual in pediatric medicine, given the historical gaps in drug and device development for young patients. However, AI introduces unique risks. For conditions that present differently in children—such as cardiomegaly (an abnormally large heart)—AI models trained on adults can misinterpret pediatric images. The Microsoft study found error rates as high as 50% when applying adult-trained AI to chest x-rays of infants aged 0–1 years.
Such high error rates are not just a sign of poor performance—they represent real, tangible dangers. Misdiagnosis or missed diagnosis can have irreparable impacts on a child’s health, development, and even survival. Age bias, therefore, translates directly to clinical risk for the youngest and often most vulnerable patients.

Root Causes: Privacy, Risk, and Dataset Aggregation​

It is worth interrogating why pediatric data remains so rare in public datasets. Several key factors contribute to this deficit. First, there is often hesitation from data custodians to share age-specific data, owing to privacy laws like GDPR and HIPAA, and the heightened fear of re-identification when data pertains to minors. Age is either not collected, not recorded, or not released to public repositories—even when it is present onsite.
Second, when large public foundation datasets are assembled—particularly those intended to train so-called “generalist” or “foundation” AI models—there is a tendency to aggregate data from numerous sources. In the process, researchers frequently drop demographic annotations, such as patient age, in an attempt to homogenize and de-risk the dataset. In the review by Microsoft, 16 aggregated imaging datasets were analyzed; in all eight instances where primary data had age information, the metadata was stripped in the aggregated version.
This approach inadvertently devalues diversity and exacerbates bias. Models built on such datasets are inherently limited in their capacity to generalize to underrepresented age groups—not only failing to serve children, but also limiting utility for other minorities or clinical subgroups.

Pediatric AI Gaps: Not Just a Regulatory Issue​

The lack of pediatric data, and subsequently of pediatric AI, is not simply a regulatory oversight that can be remedied with new rules or guidance. It is an upstream problem deeply embedded in research culture, data stewardship, and infrastructural planning. Without intentional efforts to collect, curate, and share pediatric medical imaging data, age bias will continue to be a major flaw in AI-driven medicine.
Recent initiatives are beginning to address this gulf. The American College of Radiology, for instance, has formed a Pediatric AI working group specifically to advocate for equal access to safe and effective AI for children. However, such groups face an uphill battle. They must challenge entrenched norms around data ownership and privacy, overcome technical obstacles in data anonymization, and secure funding for large-scale pediatric cohort building.

Critical Analysis: The Road to Inclusive Biomedical AI​

The findings from this comprehensive review are both alarming and clarifying. On the one hand, they highlight a system-wide neglect that could hold back, or even endanger, an entire generation of patients. On the other, they illuminate the road to more inclusive, equitable AI in health care.

Notable Strengths​

  • Exhaustive Dataset Review: The Microsoft study’s methodology stands out for its scale and transparency. By surveying 181 public imaging datasets, the authors help quantify a problem that had previously been described mostly anecdotally. The review cuts across modalities, geographies, and machine learning tasks.
  • Empirical Demonstrations of Harm: By testing AI models trained on adult data and reporting error rates on pediatric images, the study moves the debate from abstract concerns to measurable risks. This empirical approach makes a compelling case for urgent action.
  • Systemic Perspective: The analysis tracks the lifecycle of pediatric data—from initial collection, through aggregation in foundation models, to eventual loss at the point of dissemination—providing valuable insight into where interventions are most needed.

Potential Risks and Ongoing Challenges​

  • Data Privacy and Re-identification: Concerns around the protection of children’s data are valid. Pediatric populations are inherently more identifiable, and breaches could have enduring consequences. Fixing age bias must not come at the expense of patient confidentiality and safety.
  • Technical Barriers to Anonymization: Effective anonymization of pediatric imaging data, particularly for rare diseases or small cohorts, remains a work in progress. There is a risk that well-meaning calls for data sharing could inadvertently increase the risk of breaches without robust technical safeguards.
  • Unintended Model Failures: Overreliance on AI models that lack pediatric training may widen health disparities if not accompanied by ongoing clinical oversight. The dangers of “off-label” AI use are compounded in resource-limited settings, where specialist review may not be readily available.
  • Lagging Policy Response: Regulatory bodies are often slower to react than the pace of technical innovation. Even as professional organizations raise awareness and form working groups, concrete guidance, and enforcement around pediatric AI are only beginning to evolve.

The Case for Investment and Collaboration​

If the current trajectory continues unchecked, biomedical AI will amplify existing health inequalities, leaving children behind in a field that promises health transformation for all. However, the solutions are within reach.
  • Data Collection Initiatives: Dedicated programs to collect pediatric medical imaging data—safely and ethically—are an essential first step. Such initiatives require international collaboration, adequate funding, and community engagement to balance privacy with scientific progress.
  • Standardized Data Annotation: Mandating or incentivizing the reporting of key demographic variables, including age, can improve transparency and support downstream pediatric research.
  • Open, Consent-Driven Data Sharing: New models of data governance, such as federated learning and synthetic data generation, offer ways to share the benefits of large datasets without exposing individual identities. Patient consent frameworks should evolve to promote participation and trust, especially among caregivers and young patients.
  • Multistakeholder Partnerships: Solving the age bias in biomedical AI will require the combined efforts of researchers, clinicians, technology firms, policy makers, and patient advocacy groups. Only through aligned action can the goal of equitable AI in healthcare be achieved.

Conclusion: Towards Age-Inclusive Medical AI​

The rapid integration of AI into medical imaging and diagnostics is reshaping health care as we know it. Yet the evidence is clear: children are being left behind, disproportionately excluded from the data that underpins AI-driven medicine. This lack of pediatric representation—less than 1% of public imaging data despite representing a quarter of the global population—threatens not just the equity but the safety of these technological advances.
Addressing age bias in biomedical AI will not be simple. It demands sustained investment, creative privacy solutions, and a commitment to transparency at every level of the research process. Regulatory action and public advocacy are essential, but only the beginning. Ultimately, to realize the full promise of AI in healthcare, it must serve—and safeguard—patients of all ages, not just the adult majority.
The call to action is urgent. Equal access to cutting-edge medicine should be a right, not a privilege. For AI to fulfill its transformative potential, children must be seen, represented, and protected within the datasets and models shaping the future of medicine. Only then can biomedical AI truly claim to be for everyone.

Source: Microsoft Lack of children in public medical imaging data points to growing age bias in biomedical AI - Microsoft Research