The corridors of medical education echo with the tension and excitement of transformation as generative AI technologies, particularly models like OpenAI’s GPT-4, rapidly reshape what it means to teach, learn, and practice medicine. The promise and peril of this disruption reverberate from medical school lecture halls to the hospital bedside, sparking debate among clinicians, educators, administrators, and—perhaps most crucially—the next generation of doctors themselves.
When Peter Lee, Carey Goldberg, and Dr. Zak Kohane set out to chronicle the “AI Revolution in Medicine,” they found themselves straddling two timelines: the speculation-rich era as GPT-4 was developed behind closed doors, and the new reality, where the model scored over 90% on the formidable United States Medical Licensing Examination (USMLE). This performance, once unthinkable, forced a fundamental question: If an AI can outperform many students on the very exams designed to ensure clinical competence, what does competency actually mean? Can traditional measures of medical knowledge survive an era when information is always instantly accessible?
Morgan Cheatham, a recent graduate of Brown’s medical school, clinical fellow at Boston Children’s Hospital, and a partner at the influential healthcare venture firm Breyer Capital, embodies this crossroads. Having participated in the research that first demonstrated ChatGPT’s ability to pass the USMLE, Cheatham admits to a sense of humility—and even existential unease. Rigorous preparation had been his crucible, yet here was an AI model waltzing through the gates that had defined generations of medical achievement.
“It set me back. It forced me to interrogate what my role in medicine would be,” Cheatham explains. “I had to do a lot of soul searching to relinquish what I thought it meant to be a physician and how I would adapt in this new environment.”
Cheatham’s experience is not unique. Daniel Chen, a second-year student at Kaiser Permanente Bernard J. Tyson School of Medicine, describes AI like ChatGPT as a near-daily companion—for everything from parsing the jargon-cluttered language of clinical notes to simulating second-opinion consultations about complex differentials. Both point to a world where knowledge is commoditized, and the value of a physician begins to migrate beyond recall to judgment, communication, and hands-on patient care.
But AI’s role doesn’t end in the classroom. When clinical rotations begin, particularly in “longitudinal integrated clerkships” where students are embedded in patient care from the outset, generative models power clinical reasoning exercises, differential diagnosis brainstorming, and even coding support for research projects. Both Cheatham and Chen highlight the emergence of tools like OpenEvidence, a large language model fine-tuned on clinical literature with built-in citation capabilities, mitigating the most commonly cited risk of generative AI: hallucinations and unsubstantiated assertions.
Crucially, while AI can rapidly synthesize evidence, its quality depends on how well students—and doctors—can frame questions and critically appraise output. As Chen puts it, “We fear a lot about the hallucinations that these models might have. And it’s something I’m always checking for...being able to have the critical understanding of analyzing the actual literature. Double-checking is just something that we’ve been also getting really good at.”
The very convenience of AI stokes a legitimate fear: that reliance on machine reasoning might stunt the development of the trainee’s most essential skill—clinical thinking. “It’s very easy as a student to give these models the relevant information about the patient history and be like, ‘Give me a 10-list differential’... And it’s very easy as a student to, you know, ‘This is difficult. Let me just use what the model says, and we’ll go with that,’” Chen admits. Medical educators share this concern, wary that the uncritical adoption of AI-powered reasoning may shortcut the hard process of learning to observe, hypothesize, and decide independently.
Yet this sea change is not without culture shock. some senior clinicians react with skepticism, seeing digital assistance as a crutch that threatens traditional hard-won skills. There remain, too, institutional barriers: hospital IT departments have variously blocked and then relented on giving access to public LLMs, wary of privacy breaches and regulatory uncertainty. Still, as Cheatham notes, “If there’s a will, there’s a way. And we will utilize this technology if we are seeing perceived value.”
Most innovation, both Cheatham and Chen argue, is student-led. “Talking to other peers at other institutions, it looks like it’s something that’s very slowly being built into the curriculum, and it seems like a lot of it is actually student-led,” Chen notes. Fourth-year students might get a student-run seminar, or at some schools, students campaign for dedicated AI components—best practices, critical appraisal, prompt engineering, and clinical safety concerns. Meanwhile, faculty and administrators often lack firsthand facility with the latest tools, making them reticent to design formalized instruction.
This educational lag is compounded by slow-moving accreditation standards. The Liaison Committee on Medical Education (LCME), the US body overseeing medical school quality, finds itself in a bind: how to evaluate schools and their graduates when both the tools of care and the knowledge base are evolving at a pace never before seen in medicine.
Cheatham is blunt: “It is urgent that our medical schools do create formalized required trainings for this technology because people are already using it.” He suggests, as an imperative, that medical education “unbundle the different components of the medical appointment and think about the different functions of a human clinician.” In particular, training in rapid medical search, prompt development, and critical evaluation of AI outputs at the point of care should be as integral to medical education as anatomy or pharmacology.
This breakdown, while liberating, creates new demands: doctors will need the judgment not just to know everything but to know what questions to ask, what answers to trust, and when to escalate. The generalist’s ability to coordinate, communicate, and comfort—the “human touch”—remains irreplaceable. As Chen points out, certain specialties, like family medicine or oncology, draw their primary value from the patient relationship, delivering bad news or supporting longitudinal care with compassion—functions for which machines, no matter how sensitive their conversational tuning, remain ill-suited.
Still, the fear of losing essential manual skills is not unfounded. The proliferation of AI-powered ambient scribing, which automatically generates clinical notes from patient interviews, threatens to short-circuit the learning that occurs in documentation. While time saved is valuable—in high-volume ambulatory settings, AI scribes are already freeing up hours for patient interaction—both students and educators warn that the mental discipline of composing the clinical narrative is central to the physician’s development. “There’s a lot of learning when you write the note,” Chen emphasizes. Automating too soon risks producing a generation of doctors unpracticed in the careful, reflective synthesis that distinguishes expert from novice.
Hospital administrators have, not without reason, attempted to fence off clinical systems from direct connection to publicly available LLMs, citing concerns about privacy and liability. Yet as Cheatham discovered, such blocks often fail in practice: mobile devices provide a backchannel for “off-label” use, and clinicians—like their patients—are often savvy enough to find a technological workaround when patient benefit appears at stake.
For patients, the transformation is both empowering and confusing. Increasingly, people arrive in clinics armed with ChatGPT-generated differential diagnoses, sometimes making specific requests for tests or interventions based on the AI’s suggestions. This changes the dynamic—sometimes for the better, sometimes not. As Chen puts it, “It’s all about shared decision making with the patient, right. Being able to acknowledge like, ‘Yeah, most of the stuff is very plausible, but maybe you didn’t think about this one symptom you have.’”
This transparency is generally positive, enabling more informed patients and richer discussions. But it raises uncomfortable questions: if patients arrive trusting (or believing) the AI’s analysis over that of a flesh-and-blood trainee or even attending, how does the medical profession maintain its authority and ensure safety? The risk of misplaced confidence, both in AI and in human clinicians, is real. Indeed, some worry that generative language models, prized for their confidence and fluency, may mislead patients or clinicians, especially when the model’s reasoning is mistaken or the training data incomplete.
Measurement and regulation remain pressing challenges. Should an AI, if it can pass the licensing exams used to certify doctors, be accorded a similar level of trust? Both theory and practice suggest “not yet.” Human clinical judgment, the ability to contextualize uncertainty, and above all, the capacity to connect and counsel, remain essential and unautomatable.
Moreover, reimbursement and payment models lag behind technological progress. Cheatham predicts that while AI’s back-office applications will proliferate within two years, meaningful, sustainable integration into core patient care will require new payment mechanisms—and that could take five years or more.
For students and new graduates, engagement with generative AI is not optional—it is an everyday reality and a career-long necessity. The leaders of this movement are distinguished not simply by their tech savvy, but by a commitment to critical thinking, continuous learning, and hands-on experimentation.
The most striking takeaway from the current generation of medical students and residents is optimism, not anxiety. Far from being daunted by the possibility that AI might someday supplant their carefully honed skills, both Cheatham and Chen focus on how the physician’s role will evolve—expanding to encompass not just medical expertise, but stewardship of technology, mentorship for patients and colleagues, and a relentless focus on the human experience of illness and healing.
The next decade will belong to those who blend mastery of technology with deep, persistent humanity. As Cheatham, Chen, and their peers demonstrate, the true revolution is happening from the ground up, led by a generation unafraid to question tradition, unashamed to experiment, and unwilling to outsource the moral heart of medicine to silicon.
Navigating the future, students stand poised not merely to use generative AI, but to shape its legacy in healthcare. If medicine is to realize the full value of its AI revolution, it will do so by learning, as these pioneers already have: to trust, to verify, and above all, to adapt.
Source: Microsoft Navigating medical education in the era of generative AI - Microsoft Research
The Sudden Arrival of AI Competence: Rethinking Expertise
When Peter Lee, Carey Goldberg, and Dr. Zak Kohane set out to chronicle the “AI Revolution in Medicine,” they found themselves straddling two timelines: the speculation-rich era as GPT-4 was developed behind closed doors, and the new reality, where the model scored over 90% on the formidable United States Medical Licensing Examination (USMLE). This performance, once unthinkable, forced a fundamental question: If an AI can outperform many students on the very exams designed to ensure clinical competence, what does competency actually mean? Can traditional measures of medical knowledge survive an era when information is always instantly accessible?Morgan Cheatham, a recent graduate of Brown’s medical school, clinical fellow at Boston Children’s Hospital, and a partner at the influential healthcare venture firm Breyer Capital, embodies this crossroads. Having participated in the research that first demonstrated ChatGPT’s ability to pass the USMLE, Cheatham admits to a sense of humility—and even existential unease. Rigorous preparation had been his crucible, yet here was an AI model waltzing through the gates that had defined generations of medical achievement.
“It set me back. It forced me to interrogate what my role in medicine would be,” Cheatham explains. “I had to do a lot of soul searching to relinquish what I thought it meant to be a physician and how I would adapt in this new environment.”
Cheatham’s experience is not unique. Daniel Chen, a second-year student at Kaiser Permanente Bernard J. Tyson School of Medicine, describes AI like ChatGPT as a near-daily companion—for everything from parsing the jargon-cluttered language of clinical notes to simulating second-opinion consultations about complex differentials. Both point to a world where knowledge is commoditized, and the value of a physician begins to migrate beyond recall to judgment, communication, and hands-on patient care.
Generative AI as Tutor, Scribe, Prognosticator—and Risk
For today’s students, generative AI systems are as indispensable as stethoscopes and laptops. In the preclinical years, these models serve as on-demand tutors, able to deconstruct and re-explain difficult concepts at any level of depth or simplicity. Chen recounts, “Sometimes if there’s a complex topic, I ask ChatGPT, like, can you explain this to me as if I was a 6-year-old?” When faculty mention obscure or specialty-specific abbreviations, the AI quickly deciphers them—saving both embarrassment and time.But AI’s role doesn’t end in the classroom. When clinical rotations begin, particularly in “longitudinal integrated clerkships” where students are embedded in patient care from the outset, generative models power clinical reasoning exercises, differential diagnosis brainstorming, and even coding support for research projects. Both Cheatham and Chen highlight the emergence of tools like OpenEvidence, a large language model fine-tuned on clinical literature with built-in citation capabilities, mitigating the most commonly cited risk of generative AI: hallucinations and unsubstantiated assertions.
Crucially, while AI can rapidly synthesize evidence, its quality depends on how well students—and doctors—can frame questions and critically appraise output. As Chen puts it, “We fear a lot about the hallucinations that these models might have. And it’s something I’m always checking for...being able to have the critical understanding of analyzing the actual literature. Double-checking is just something that we’ve been also getting really good at.”
The very convenience of AI stokes a legitimate fear: that reliance on machine reasoning might stunt the development of the trainee’s most essential skill—clinical thinking. “It’s very easy as a student to give these models the relevant information about the patient history and be like, ‘Give me a 10-list differential’... And it’s very easy as a student to, you know, ‘This is difficult. Let me just use what the model says, and we’ll go with that,’” Chen admits. Medical educators share this concern, wary that the uncritical adoption of AI-powered reasoning may shortcut the hard process of learning to observe, hypothesize, and decide independently.
From “Pimping” to Peer: The New Clinician Stack
Anecdotally, both Cheatham and Chen paint a vivid picture: rounds in academic medical centers now feature a new game of “defense against pimping”—using AI tools in real time during rapid-fire questioning by senior attendings. Where medical education once depended on recall and subtle cues, students now have access to what Cheatham calls “the clinician stack”: an evolving suite of AI and LLM-enabled software for clinical search, documentation, and decision support. This stack is dynamic. One week a student may use Claude, the next Perplexity.ai or a domain-specific model like OpenEvidence. The expectation is not to memorize static content, but to marshal resources and evaluate their relevance and reliability under pressure.Yet this sea change is not without culture shock. some senior clinicians react with skepticism, seeing digital assistance as a crutch that threatens traditional hard-won skills. There remain, too, institutional barriers: hospital IT departments have variously blocked and then relented on giving access to public LLMs, wary of privacy breaches and regulatory uncertainty. Still, as Cheatham notes, “If there’s a will, there’s a way. And we will utilize this technology if we are seeing perceived value.”
The Curriculum Gap: Student-Led Innovation Versus Staid Accreditation
Both the power and the peril of AI in medical education stem from its status as an unstoppable but poorly integrated force. Despite experiments at forward-looking programs like Kaiser Permanente, where students receive formal training on widely used EMR tools like Epic’s HealthConnect, there is, as of now, little systematic curriculum around responsible AI use.Most innovation, both Cheatham and Chen argue, is student-led. “Talking to other peers at other institutions, it looks like it’s something that’s very slowly being built into the curriculum, and it seems like a lot of it is actually student-led,” Chen notes. Fourth-year students might get a student-run seminar, or at some schools, students campaign for dedicated AI components—best practices, critical appraisal, prompt engineering, and clinical safety concerns. Meanwhile, faculty and administrators often lack firsthand facility with the latest tools, making them reticent to design formalized instruction.
This educational lag is compounded by slow-moving accreditation standards. The Liaison Committee on Medical Education (LCME), the US body overseeing medical school quality, finds itself in a bind: how to evaluate schools and their graduates when both the tools of care and the knowledge base are evolving at a pace never before seen in medicine.
Cheatham is blunt: “It is urgent that our medical schools do create formalized required trainings for this technology because people are already using it.” He suggests, as an imperative, that medical education “unbundle the different components of the medical appointment and think about the different functions of a human clinician.” In particular, training in rapid medical search, prompt development, and critical evaluation of AI outputs at the point of care should be as integral to medical education as anatomy or pharmacology.
The Changing Shape of Expertise: Specialties, Scribes, and the Human Element
One underappreciated effect of AI, Cheatham argues, is the collapse or at least the destabilization of traditional medical specialties. Specialization arose historically to manage cognitive overload—a recognition that medicine’s breadth outpaced the capacity of the generalist. But if AI can absorb and synthesize the knowledge base of multiple specialties on demand, the fixed boundaries between, say, cardiology and nephrology, may blur.This breakdown, while liberating, creates new demands: doctors will need the judgment not just to know everything but to know what questions to ask, what answers to trust, and when to escalate. The generalist’s ability to coordinate, communicate, and comfort—the “human touch”—remains irreplaceable. As Chen points out, certain specialties, like family medicine or oncology, draw their primary value from the patient relationship, delivering bad news or supporting longitudinal care with compassion—functions for which machines, no matter how sensitive their conversational tuning, remain ill-suited.
Still, the fear of losing essential manual skills is not unfounded. The proliferation of AI-powered ambient scribing, which automatically generates clinical notes from patient interviews, threatens to short-circuit the learning that occurs in documentation. While time saved is valuable—in high-volume ambulatory settings, AI scribes are already freeing up hours for patient interaction—both students and educators warn that the mental discipline of composing the clinical narrative is central to the physician’s development. “There’s a lot of learning when you write the note,” Chen emphasizes. Automating too soon risks producing a generation of doctors unpracticed in the careful, reflective synthesis that distinguishes expert from novice.
Trust, Transparency, and the Evolving Doctor-Patient Relationship
Perhaps the most profound test for AI in medical education is the evolving nature of trust—between students and their tools, and between patients and the professionals who care for them.Hospital administrators have, not without reason, attempted to fence off clinical systems from direct connection to publicly available LLMs, citing concerns about privacy and liability. Yet as Cheatham discovered, such blocks often fail in practice: mobile devices provide a backchannel for “off-label” use, and clinicians—like their patients—are often savvy enough to find a technological workaround when patient benefit appears at stake.
For patients, the transformation is both empowering and confusing. Increasingly, people arrive in clinics armed with ChatGPT-generated differential diagnoses, sometimes making specific requests for tests or interventions based on the AI’s suggestions. This changes the dynamic—sometimes for the better, sometimes not. As Chen puts it, “It’s all about shared decision making with the patient, right. Being able to acknowledge like, ‘Yeah, most of the stuff is very plausible, but maybe you didn’t think about this one symptom you have.’”
This transparency is generally positive, enabling more informed patients and richer discussions. But it raises uncomfortable questions: if patients arrive trusting (or believing) the AI’s analysis over that of a flesh-and-blood trainee or even attending, how does the medical profession maintain its authority and ensure safety? The risk of misplaced confidence, both in AI and in human clinicians, is real. Indeed, some worry that generative language models, prized for their confidence and fluency, may mislead patients or clinicians, especially when the model’s reasoning is mistaken or the training data incomplete.
Beyond the Hype: Critical Appraisal, Real-World Impact, and Cautious Optimism
While the narrative of AI in medicine can easily tip into techno-optimism, experienced observers like Lee, Cheatham, and Chen temper enthusiasm with caution. The recent explosion of AI pilots in clinical environments has, according to Cheatham, yielded a phenomenon dubbed “pilotitis”—an abundance of small-scale trials, but few that transition into production use. Data suggest that only a third of pilots achieve scale, underscoring the need for careful evaluation, iterative improvement, and a willingness to sunset underperforming tools.Measurement and regulation remain pressing challenges. Should an AI, if it can pass the licensing exams used to certify doctors, be accorded a similar level of trust? Both theory and practice suggest “not yet.” Human clinical judgment, the ability to contextualize uncertainty, and above all, the capacity to connect and counsel, remain essential and unautomatable.
Moreover, reimbursement and payment models lag behind technological progress. Cheatham predicts that while AI’s back-office applications will proliferate within two years, meaningful, sustainable integration into core patient care will require new payment mechanisms—and that could take five years or more.
The Road Ahead: Integration, Inspiration, and the Next 10 Years
Looking further out, both Cheatham and Chen see the dissolution of the walls between clinical care and biomedical discovery. The long-dreamed “learning health system”—where every patient encounter generates data not only for individual care but for the improvement of medicine itself—becomes feasible with AI-powered integration, search, and analysis. The hope is that by leveraging generative models, the system can become continuously self-improving, every diagnosis and treatment informing future decisions.For students and new graduates, engagement with generative AI is not optional—it is an everyday reality and a career-long necessity. The leaders of this movement are distinguished not simply by their tech savvy, but by a commitment to critical thinking, continuous learning, and hands-on experimentation.
The most striking takeaway from the current generation of medical students and residents is optimism, not anxiety. Far from being daunted by the possibility that AI might someday supplant their carefully honed skills, both Cheatham and Chen focus on how the physician’s role will evolve—expanding to encompass not just medical expertise, but stewardship of technology, mentorship for patients and colleagues, and a relentless focus on the human experience of illness and healing.
Key Takeaways and Recommendations
- Medical education is grappling with the implications of AI models achieving—and often exceeding—human-level performance on knowledge assessments like the USMLE. This necessitates a shift from memorization to judgment, communication, and critical thinking as defining physician skills.
- Generative AI serves as a personalized tutor, scribe, and second set of eyes for current trainees, accelerating learning but raising real-world questions about information reliability and the risk of diminished critical thinking.
- Integration of AI into medical school curricula remains ad hoc and often student-led; formal accreditation bodies and faculty development lag well behind student adoption and tech innovation.
- The boundaries between traditional medical specialties may blur as AI enables individual clinicians to access and synthesize an ever-expanding clinical knowledge base.
- Widespread use of AI for notetaking and patient interaction calls for a deliberate re-examination of which skills are developed by hand and which can safely be augmented or replaced by technology.
- New payment and reimbursement models are necessary for widespread AI adoption beyond well-funded academic centers.
- Trust—between clinicians, patients, and algorithms—is both newly fragile and critically important. Transparency about the limitations and strengths of generative AI is essential.
- The most successful future doctors will not be those who resist AI, but those who harness it wisely while preserving what is most human in medicine: context, compassion, and care.
Conclusion: The Student-Driven Revolution
The era of generative AI in medicine is not coming—it is here. Medical education, slow to change by design, risks irrelevance unless it adapts quickly to a world where students and patients expect, and demand, instant, AI-powered expertise and support. The loudest call for change, however, does not arise from administrators or even from the tech sector, but from students themselves—driven by both necessity and curiosity to redefine what it means to be a doctor.The next decade will belong to those who blend mastery of technology with deep, persistent humanity. As Cheatham, Chen, and their peers demonstrate, the true revolution is happening from the ground up, led by a generation unafraid to question tradition, unashamed to experiment, and unwilling to outsource the moral heart of medicine to silicon.
Navigating the future, students stand poised not merely to use generative AI, but to shape its legacy in healthcare. If medicine is to realize the full value of its AI revolution, it will do so by learning, as these pioneers already have: to trust, to verify, and above all, to adapt.
Source: Microsoft Navigating medical education in the era of generative AI - Microsoft Research