ChatGPT is a generative AI chatbot that exploded onto the tech scene in November of 2022, soon turning heads in healthcare with headlines like “Is ChatGPT Becoming ChatMD?” in Forbes after researchers from MGH, Brown and Ansible Health found ChatGPT performing “at or near the passing threshold” for all three USMLE licensing exams.
Seeking to upstage itself in the frenetic 2023 news landscape of generative AI, ChatGPT’s developer OpenAI released GPT-4 in March: a “2.0” version of its previous GPT-3.5 model. Compared to GPT-3.5, GPT-4 can handle longer and more complex prompts with the promise of being able to handle visual inputs.
As cancer care is entering the era of precision oncology, we sought to test the ability of ChatGPT to process hypothetical complex patient cases and molecular testing results to generate precision oncology reports and treatment recommendations. Our goal was twofold: first to evaluate the relevance of ChatGPT-generated next-generation sequencing reports, and second, to assess their accuracy in providing first-line treatment recommendations. These reports were specifically focused on patients with non-small cell lung cancer (NSCLC) with targetable driver oncogenes.
Upon opening ChatGPT, users are faced with a simple text box in which they can input conversational free text (a “new chat” window). Our research prompt was “Create a next-generation sequencing report with a list of first-line treatment options for a patient with stage IV non-small cell lung cancer with an [oncogenic driver].”
After sending the message, the AI begins generating a detailed and comprehensive text response within seconds, which can only be described as an enthralling and uncanny experience. Here is an example of the text generated for the prompt for a theoretical patient with an EGFR exon 19 deletion:
In this example, GPT-3.5 correctly included osimertinib and the TKIs afatinib and gefitinib. When scoring relevance, we awarded 1 point for every NCCN preferred option and 0.5 points for each “other recommended” treatment listed in the AI-generated output, divided by the maximum possible score for the driver oncogene (in this case, a 0.57). It also judiciously caveated that the decision should be made with an oncologist.
However, it showcases some of the pitfalls of the technology as well. For accuracy, we calculated the reported treatment options listed in NCCN over the total number of treatments in a report. In this example, ChatGPT listed six options, three of which are not recommended by NCCN (e.g., immunotherapy). Such inaccuracies and made-up information (i.e., AI “hallucinations”), are not uncommon and are important limitations to the technology in its current form. Further, ChatGPT is only trained on data up until its data cutoff of September 2021, which is considerably passé in the dynamic and rapidly evolving treatment landscape of NSCLC and other malignancies. For example, running our prompt for EGFR S768I is riddled with hallucinations, since guidelines for those biomarkers emerged after September 2021.
Overall, we ran ten prompts for each of the eight driver NSCLC oncogenes with FDA-approved therapies as of the GPT-data cutoff, with an accuracy score of 68.8% and a relevancy score of 0.59. While accuracy scores reflect the percentage of treatment options in the NCCN guidelines that were mentioned, relevancy scores assess the ability of ChatGPT to provide treatment recommendations with more weight given to NCCN preferred options.
Our findings and those of others show that in its current form, AI models like ChatGPT, while promising, are not yet capable of effectively and safely automating clinical tasks using its current knowledge base.
However, an emerging alternative application is steadily gaining momentum: utilizing GPT-4's capacity to synthesize vast quantities of data and information to enhance human performance. Oncology, in particular, could greatly benefit from this approach.
The field of oncology, along with its related molecular targeted therapies, is dynamic and constantly evolving, presenting a substantial challenge when it comes to integrating extensive genomic data into standard cancer care. Biomarker and genomic testing are becoming increasingly commonplace across various malignancies, leaving oncologists to grapple with an overwhelming volume of data that demands complex interpretation. Consequently, many institutions, including our University of Illinois at Chicago Precision Oncology Tumor Board, have created genomic/precision oncology "tumor boards," where multidisciplinary teams of clinicians collaboratively review patient records, genomic test results, and relevant literature to facilitate clinical decision-making and this challenge. Emerging technologies such as ChatGPT can assist clinicians process the sheer volume of genomic data to make more effective decisions about personalized treatment options.
Certainly, it will be interesting to see how AI will continue to be integrated into oncology care, and the ASCO annual conference is an opportunity to showcase not only the work that researchers have put in over the past year but will be an indicator of future trends. When asked why you should stop by our presentation, ChatGPT said “Stop by to learn about my potential in generating NGS reports for oncogene-driven NSCLC. See how I can quickly create treatment options and improve patient accessibility and join the conversation on the role of AI language models in shaping the future of oncology.”
ASCO Abstract #412696: Relevance and accuracy of ChatGPT-generated NGS reports with treatment recommendations for oncogene-driven NSCLC.
Mr. Hamilton, Dr. Jain, and Dr. Nguyen have no conflicts of interest to report.
Image by GoodStudio / Shutterstock
Are you interested in writing about your experience attending a medical conference this year? Respond here.