Understanding AI in Medicine: A Clinician's Guide

Op-Med is a collection of original essays contributed by Doximity members.

Last week, during our noon conference, a pharmacist gave a talk about the relatively new migraine medications CGRP receptor antagonists. She showed us a schematic of the in vivo pathway, taught us about the biochemical anatomy, and described the downstream mechanisms and how they contribute to the pathophysiology of migraine. Despite not being a pharmacist or biochemical researcher, I had a good understanding of the underlying mechanisms of the drug class and felt confident about prescribing it. I am not an expert who will be expanding the body of knowledge in this space, but I have a full, nuanced, and useful model for how these medications work.

In contrast, a few weeks ago, my institution launched an internal, HIPAA-compliant, GPT-powered chatbot. There was no training or literature offered to better understand how this tool works or how to use it.

As someone interested in health tech, I’ve been frustrated by the lack of availability of useful primers to help physicians understand how much of this new technology works. The materials I did find were vague and general, lacking the depth to be truly useful. Doctors are intelligent people but have a very narrow domain focus. Most of us don’t have the specific technical skillset to understand the complex programming and processes behind artificial intelligence (AI) and machine learning — but we are used to learning complicated topics and thinking critically.

This essay is an overview of what generative AI is and how large language models (LLMs) work, aimed at an audience that does not have the technical background but is smart, thoughtful, and intellectually curious. The goal is to provide a primer that is fairly granular but still accessible. I believe that by having a framework for how this technology works, clinicians can better appreciate the strengths and limitations of current AI tools.

The Evolution of Computing in Medicine

Historically, computers have been straightforward tools — complex in capability but fundamentally simple in how they operate. They take an input, process it according to predefined rules, and generate an output. Computer programming involves providing explicit instructions to tell a computer exactly what to do: how to process data, respond to user inputs, make displays, etc. When I was an undergraduate learning a programming language for the first time, my code would often break in unexpected ways. I was convinced there was an issue or a bug with the code or system I was using. However, careful debugging always brought to light that I had made the mistake. If the computer did not behave as I wanted it to, it was because I made an error in the instructions.

Now, however, the advent of generative AI represents a paradigm shift. Computers can create new content without being explicitly programmed to do so. This has incredible applications in all industries and especially in medicine.

What Is Generative AI?

AI is a broad, overarching term that encompasses everything from machine-learning algorithms to advanced neural networks. AI, in various incarnations, has actually been around for a long time. For example, anytime you get a recommendation for a new YouTube video to watch or when a credit card transaction is flagged for fraud, that’s AI in action. Specifically, these are predictive models; such models are also used in health care. For example, CHA₂DS₂-VASc is a linear regression model to predict stroke in atrial fibrillation. Other more complex models include predicting readmission rates or predicting sepsis based on clinical parameters.

Generative AI, a subset of AI, goes beyond categorizing or organizing information; it creates new content.

LLMs are one type of generative AI focused on text. That is, they can generate new content that is text based, whether it’s for long-form essays, poetry, code, or clinical notes. ChatGPT started as an LLM. Other LLMs include Google’s Claude and Meta’s Llama. These models take input in the form of text from a user and output new text in response. Of course, now ChatGPT has expanded beyond the original text-based platform, but it is useful to consider it as an LLM.

How Large Language Models Work

LLMs are at the core of many AI applications, so it’s helpful to have a more in-depth understanding. In their most simple form, LLMs are word prediction machines. For example, given the phrase “Dogs are,” the model might predict “man’s,” then “best,” and finally “friend,” resulting in “Dogs are man’s best friend.” The inputs can be increasingly complex and build upon prior inputs. The LLM guesses the next word based on everything that has been said before. It stands to reason that certain words are associated with each other at increased frequency. For example, we can assume that the words “woman” and “queen” are more likely to be associated than “woman” and “king.” LLMs essentially create maps of these probabilities and use that to guess what word comes next. Different models will have different predictions based on how they are created. Not all models are the same.

It is important to understand that LLMs are not datasets or data repositories in a classical sense. Consider the example that was shared a few months ago. GPT-3 was asked, “Who is Tom Cruise’s mother?” GPT-3 accurately responded that Tom Cruise’s mother is Mary Lee Pfeiffer. However, when GPT-3 was asked, “Who is Mary Lee Pfeiffer’s son?” it responded that it didn’t know. This highlights the fact that GPT doesn’t so much store data but rather leverages associations that are one-dimensional and can be accessed only through certain paths.

The Training Process

The power behind LLMs lies in their parameters — large files containing the model’s weights or knowledge encoded as numerical values, often exceeding 50 billion parameters. Parameters are not manually programmed — that would be prohibitively intensive and practically impossible. The general idea behind creating LLMs and their parameters has two key steps: pretraining and fine-tuning.

Pretraining: Consider how a child learns language. It’s not through explicit instruction. It’s through absorbing and interpreting linguistic patterns in their environment through hours and hours of human interaction. Pretraining is somewhat analogous to this. It’s a very computationally intensive process that essentially quantifies patterns in our language. Very large datasets are used in pretraining. For example, GPT-3 was trained on over 500 GB of text scraped from the internet, including books, Wikipedia, online forums, blog posts, and various text-based websites. This process essentially quantifies the association between different words and can create a neural network — like noting how the words “queen,” “woman,” “king,” and “man” are associated.

Fine-tuning: This labor-intensive process focuses on quality over quantity. It involves training the model on accurate datasets, often with human input to evaluate and improve responses. For example, a dataset might include a list of questions that a patient may ask and provide accurate responses to those questions. Human evaluators might choose the best answer from multiple outputs or provide their own answers, creating a “gold standard” for the LLM to learn from.

The pretraining stage is about a very large amount of data but not necessarily the best quality, as it’s a random subset of the internet. The fine-tuning stage utilizes less data but focuses on quality over quantity.

Conclusion

To summarize, generative AI is a subset of AI that deals with the creation of new, original content. LLMs are a subset of generative AI that are text based. LLMs are trained through two major steps: pretraining and fine-tuning. LLMs have incredible potential in medicine.

It’s important for doctors and clinicians who are on the front lines of medicine, and who take care of patients, to be at the forefront of this new technology in order to help shape its applications in medicine. Ultimately, to have the greatest impact, we want to balance the strengths and weaknesses of both AI and humans to develop tools that promote human health, improve patient outcomes, and support clinicians.

What applications of AI in medicine have excited you the most? Share your experiences below.

Corinne Carland is an internal medical resident at the University of Pennsylvania in Philadelphia. She is interested in health tech, clinical informatics, cardiology, and genomics/proteomics. Outside of work she enjoys walking/running along the river, trying new ice cream spots, and exploring museums. Dr. Carland was a 2023–2024 Doximity Op-Med Fellow.

Animation by Diana Connolly

All opinions published on Op-Med are the author’s and do not reflect the official position of Doximity or its editors. Op-Med is a safe space for free expression and diverse perspectives. For more information, or to submit your own opinion, please see our submission guidelines or email opmed@doximity.com.