MIT VaxSeer: AI Forecast to Improve Flu Vaccine Strain Match

ChatGPT · Aug 30, 2025

MIT’s new AI system, VaxSeer, promises to sharpen the blunt instrument of seasonal influenza vaccine selection by predicting which viral strains will dominate and which vaccine candidates will provide the best antigenic match months before manufacturing decisions must be locked in.

Background

The influenza vaccine selection process is a high-stakes, time-sensitive exercise. Twice a year the World Health Organization coordinates an international review of surveillance, genetic sequencing, and laboratory antigenicity testing to recommend which influenza strains manufacturers should include in the next season’s vaccines. Those recommendations must be finalized many months ahead of the season so manufacturers can produce and distribute hundreds of millions of doses. When the recommended strains closely match the viruses that ultimately circulate, vaccine effectiveness (VE) can be substantial; when the match is poor, protection drops sharply. Historically, overall seasonal VE has varied widely, and in many recent years it has hovered in the low tens of percent to roughly the 40–60% band during well-matched seasons. That variability and the long lead times are precisely the problem VaxSeer is designed to address.
VaxSeer is an integrated machine-learning framework developed by researchers at MIT that combines a sequence-aware protein language model with epidemiological dynamics and antigenicity prediction to produce a single, forward-looking metric called a coverage score for candidate vaccine formulations. In retrospective tests spanning a decade of influenza data, the developers report that VaxSeer would have selected more antigenically protective strains than the WHO recommendations for the A/H3N2 subtype in nine of ten seasons and matched or exceeded WHO performance for A/H1N1 in most years. Those retrospective gains, if replicated prospectively, could improve strain match, increase vaccine effectiveness in practice, and reduce the public-health and logistical costs of poor matches.

How VaxSeer works: a technical overview

VaxSeer is built from three conceptual components that work together to recommend vaccine strains:

A protein language model trained on decades of viral HA (hemagglutinin) sequences to learn how combinations of amino-acid changes affect viral fitness and competitive success in the population.
A dominance predictor that models how likely a given viral lineage is to rise in prevalence over time, accounting for competition among co-circulating variants.
An antigenicity predictor that estimates, in silico, how well antibodies elicited by a candidate vaccine strain will neutralize emerging strains, approximating the laboratory hemagglutination inhibition (HI) assay.

These outputs are combined through a mathematical framework that simulates viral spread over time (using ordinary differential equation–style dynamics) and produces a coverage score for each vaccine formulation. The coverage score is intentionally forward-looking: it weights antigenic similarity by the predicted future dominance of circulating strains so that a vaccine that neutralizes a highly likely dominant lineage ranks higher than one that neutralizes only low-frequency variants.
Key technical features reported by the developers:

The protein model captures combinatorial effects of mutations rather than treating individual amino-acid substitutions independently.
The dominance model simulates competitive dynamics between multiple lineages over a prospective season.
Antigenicity is estimated against HI assay proxies so the system speaks the same practical language used by labs and public-health authorities.
The coverage score scale is constructed so that values closer to zero indicate better expected antigenic match against future circulating viruses.

This hybrid design—sequence-level representation learning combined with dynamical epidemiology and functional antigenicity prediction—is a deliberate departure from methods that either focus only on evolutionary trajectories or only on antigenic cartography.

What the retrospective tests show — strengths and concrete gains

The MIT team evaluated VaxSeer against a decade of historical influenza surveillance and antigenicity data and reported several headline results:

For influenza A/H3N2, VaxSeer’s strain selections outperformed the WHO’s retrospective choices in nine of ten seasons using an empirical coverage score derived from observed dominance and HI-test results.
For influenza A/H1N1, VaxSeer matched or outperformed WHO recommendations in the majority of test seasons.
The system identified at least one strain in 2016 that the WHO added to vaccine recommendations only in the following year—an instructive example of a potential early-warning signal.
VaxSeer’s predicted coverage scores showed strong correlation with independently measured vaccine effectiveness and public-health outcomes in multiple surveillance systems.

These results are significant for several reasons. First, H3N2 is notorious for rapid antigenic drift and has historically been the most difficult subtype to predict and to vaccinate against effectively. A model that retrospectively improves selection for H3N2 by an empirical margin in nine out of ten seasons suggests the approach captures meaningful evolutionary and antigenic signals that conventional expert selection may miss. Second, tying antigenicity predictions to expected dominance produces a practical metric—coverage score—that directly targets the operational decision problem WHO and national advisory groups face.
Practically, better strain choices could allow manufacturers to start production earlier with more confidence, reduce the chance of mid-cycle reformulation, and decrease the risk of seasons with very poor VE that drive hospitalizations and deaths. For health systems and employers, even modest VE improvements scale to substantial reductions in outpatient visits, hospitalizations, and absenteeism.

Why this matters operationally: vaccine timelines and decision points

Seasonal influenza vaccine production is governed by long lead times. Key operational realities:

WHO holds biannual vaccine composition consultations (typically in February for the Northern Hemisphere and in September for the Southern Hemisphere), months before the peak influenza seasons.
Manufacturers require months to produce, test, and distribute egg- or cell-based vaccines, with additional time needed for regulatory review and logistics.
Antigenic mismatches discovered late in the cycle leave manufacturers little room to pivot; changing strains late is expensive and sometimes impossible.

VaxSeer’s forward-looking coverage score is designed to plug into that workflow by providing earlier and quantitatively ranked candidates for consideration. In theory, this could enable:

Faster consensus among national advisory committees because ranked options clarify trade-offs.
Earlier procurement decisions with quantified risk assessments.
Conditional strategies where manufacturers plan alternative seed strains but only move forward when coverage thresholds are met.

However, operational integration requires trust, validation, and transparency. Retrospective performance is necessary but not sufficient for adoption by multilateral agencies that operate by consensus.

Critical analysis: strengths

Methodological integration: Combining a protein language model with epidemiological simulation and antigenicity prediction is a principled way to address both what will circulate and whether a vaccine will neutralize it. This end-to-end thinking targets the real operational objective—population-level protection—rather than proxy goals alone.
Strong retrospective performance on H3N2: The reported nine-out-of-ten win rate for A/H3N2 in retrospective tests is notable because H3N2 has been historically the hardest subtype to predict and often drives poor VE seasons.
Practical metric design: The coverage score maps directly to the decision question vaccine planners face—expected antigenic protection against likely future viruses—enabling ranked recommendations instead of binary choices.
Use of established lab proxies: Anchoring antigenicity predictions to the hemagglutination inhibition assay makes results interpretable to virologists and public-health labs that already use HI as a standard metric.
Extensibility: The framework’s modular nature suggests it could be adapted for other fast-evolving pathogens if sufficient sequence and antigenicity data exist.

Critical analysis: limitations and risks

Retrospective vs prospective performance: Retrospective success is encouraging but does not guarantee prospective reliability. Influenza dynamics are path-dependent and sensitive to rare events; models that leverage past patterns can underperform when novel, unforeseen evolutionary paths arise.
Data biases and surveillance gaps: The model depends on high-quality, representative viral sequences and antigenicity measurements. Surveillance and sequencing are uneven globally; lineage prevalence estimates can be skewed by sampling bias, which propagates into dominance predictions.
HI assay limitations: The hemagglutination inhibition assay is a pragmatic proxy for antigenic match but has recognized limitations—particularly for some H3N2 viruses and in interpreting polyclonal human responses. Antigenic cartography and neutralization assays provide richer signals that VaxSeer may not fully capture if it relies primarily on HI proxies.
Interpretability and trust: Protein language models are powerful but can be opaque. Public-health agencies and manufacturers may demand interpretable explanations of why a strain is predicted to dominate, especially when recommendations diverge from established expert judgment.
Regulatory and governance hurdles: WHO’s vaccine composition process is a global, consensus-driven exercise with built-in transparency. Integrating AI recommendations would require open validation, reproducible methods, and mechanisms to resolve disagreements between model outputs and human experts.
Adversarial and dual-use concerns: Predictive models that forecast viral evolution could, in theory, be misused if misapplied; governance controls and responsible disclosure practices are necessary.
Funding and perceived conflicts: The involvement of defense-related funding sources may raise questions in some circles about intent and governance, even when research is academically rigorous and ethically constrained.
Overfitting to past assay conventions: Training models on decades of historical HI data risks baking in quirks of assay protocols or laboratory-specific practices that might not generalize across different labs or future assay improvements.

Practical challenges to adoption and how to mitigate them

Bringing VaxSeer from research into routine vaccine selection requires several concrete steps:

Rigorous, independent validation
Run prospective pilot studies in collaboration with national influenza centers and independent labs.
Compare VaxSeer outputs with ongoing surveillance in real time and publish blind validation results.
Transparency and reproducibility
Release model code, training data schemas, and evaluation scripts under appropriate licensing so external groups can reproduce results.
Provide model explanations and feature-attribution outputs usable by virologists.
Multi-stakeholder pilots
Start with low-stakes integrations: provide VaxSeer as an advisory input to national technical advisory groups while maintaining existing WHO protocols.
Organize joint workshops with WHO, national centers, manufacturers, and regulators to co-design workflows.
Address data equity
Create mechanisms to improve sequence and antigenicity data collection in under-sampled regions.
Use statistical corrections for sampling bias and explicitly quantify uncertainty when data coverage is sparse.
Build regulatory confidence
Work with regulatory bodies and international organizations to define criteria under which a model-informed recommendation could accelerate or complement conventional processes.
Define risk thresholds and contingency plans
Use coverage-score thresholds to trigger contingency manufacturing-planning steps rather than automatic strain changes.
Maintain redundant vaccine seed stocks where feasible.
Safeguards for misuse
Apply ethical governance frameworks and limit access to any model components that could materially assist harmful manipulation of viral sequences.

Scientific caveats and unverifiable claims

Several claims associated with VaxSeer deserve caution:

Any headline claim that a model will definitively “boost vaccine effectiveness by X%” is premature. Retrospective coverage-score gains must be translated into prospective, real-world VE improvements through rigorous evaluation.
Claims about the model’s applicability to other viruses (for example, coronaviruses) are plausible but conditional on the availability of equivalent, high-quality antigenicity datasets and the unique biology of those viruses. Extrapolation without targeted testing is speculative.
The degree to which the coverage score fully captures human immune history (previous infections, vaccinations, cross-reactivity) is limited. That immunological background strongly shapes population-level outcomes and is a complex variable for any predictive model.

These are not arguments against VaxSeer; rather, they are practical reminders that computational tools must be embedded in scientific, ethical, and public-health processes with appropriate humility and safeguards.

Broader implications: beyond better strain selection

If VaxSeer or similar frameworks deliver robust prospective improvements, the implications extend beyond incremental VE gains:

Manufacturing confidence and supply resilience: Better forecasts could reduce last-minute production changes and the associated waste or shortages. This would strengthen supply-chain resilience and potentially lower costs for manufacturers and health systems.
Targeted vaccination strategies: Quantitative predictions of likely dominant lineages could inform targeted vaccine formulations for different geographies or high-risk groups when feasible.
Accelerated development cycles: For next-generation vaccines (e.g., universal or broadly protective candidates), predictive modeling could prioritize antigens and designs that are robust to likely evolutionary paths.
Cross-pathogen applications: The conceptual architecture—combining sequence-aware ML, dominance modeling, and antigenicity estimation—might be transferable to other rapidly evolving pathogens given sufficient data, creating a reusable toolkit for epidemic preparedness.
Policy and international coordination: Introducing AI into strain selection will necessitate new international protocols for model validation, data sharing, and decision arbitration, reshaping governance around vaccine composition decisions.

Recommended roadmap for cautious, evidence-driven deployment

Publish full methods and independent code releases to enable community review and replication.
Run prospective blind validations with multiple surveillance partners during at least two influenza seasons.
Provide transparent uncertainty estimates and interpretable explanations alongside every coverage-score output.
Pilot VaxSeer as a formal advisory input to national vaccine advisory groups while preserving multi-party WHO decision-making.
Invest in global sequencing and antigenicity surveillance, especially in under-sampled regions, to reduce bias and expand model generalizability.
Establish a multi-stakeholder governance body to oversee responsible use, dual-use risk assessment, and ethical release policies.

Conclusion

VaxSeer represents a compelling example of how modern machine learning—when combined with biological domain knowledge and epidemiological modeling—can translate to actionable, operational metrics that address a longstanding bottleneck in vaccine planning. Its retrospective performance, especially on the notoriously unpredictable H3N2 subtype, suggests the approach captures meaningful signals that human experts and conventional methods sometimes miss.
That potential, however, is tempered by real-world constraints: surveillance gaps, HI-assay limitations, model interpretability, and the social and regulatory fabric of global vaccine decision-making. Moving from retrospective success to routine adoption will require independent, prospective validation, transparent methods and governance, and careful pilot integration with existing WHO-driven processes.
If those pieces come together, a tool like VaxSeer could modestly but materially improve flu vaccine match rates—reducing illness and hospitalizations year to year—and create a blueprint for applying AI to other fast-evolving threats. If the field moves forward with rigor, transparency, and appropriate safeguards, the race to outpace viral evolution could shift from reactive guesswork to informed anticipation.

Source: eWeek Flu Vaccine Accuracy Gets a Boost From MIT's New AI Model

MIT VaxSeer: AI Forecast to Improve Flu Vaccine Strain Match

Background​

How VaxSeer works: a technical overview​

What the retrospective tests show — strengths and concrete gains​

Why this matters operationally: vaccine timelines and decision points​

Critical analysis: strengths​

Critical analysis: limitations and risks​

Practical challenges to adoption and how to mitigate them​

Scientific caveats and unverifiable claims​

Broader implications: beyond better strain selection​

Recommended roadmap for cautious, evidence-driven deployment​

Conclusion​