Article Image

An AI-Based Suicide Risk Screening Algorithm for Your EHR

Op-Med is a collection of original articles contributed by Doximity members.
Image: CNStock/

Most doctors know how difficult it is to determine suicide risk. They also know how hard it is get beyond, “Do you have suicidal thoughts?” or “ Do you have a plan?”

However, thanks to the recently published study, Predicting Suicide Behavior From Longitudinal Electronic Health Records, it appears we are soon going to have a suicide risk assessment tool for our EHR dashboard which can identify high risk patients up to 3.5 years prior to suicide attempt.

How? It’s able to do this by using an artificial intelligence (AI) model employing machine learning using an atheoretical naive Bayesian algorithm on structured data!

Uh, what?

Yep, that’s mouthful, but let’s parse it out.

Let’s start with something every doctor already knows: “Bayesian algorithms,” something which has appeared on every test since 2nd-year medical school, and something we we use clinically every day.

This is the conditional probability theory we all know about; the calculations which derive such things as sensitivity and specificity, for screening tests.

For example, we all know a positive rapid strep screen is more likely to be a true positive if we are swabbing a group of 50 kids with bad sore throats and fevers, than if we are swabbing 50 kids with ankle strains. That’s a Bayesian analysis.

Next buzzword: “Machine learning.” In the strep screen case above, we are using two inputs, fever and sore throat for our study.

But what if instead we wanted to look at a condition with a much lower prevalence in the community, such as suicide, and then consider not just two, but hundreds of contributing factors?

To do this, the calculations wouldn’t be much more difficult, but there would be a lot more of them. In fact millions more in the suicide risk study above, which used over 100 different input data elements per chart from over 1. 7 million charts.

In this case, that’s what is meant by machine learning: repetitive probability calculations, done millions of times. It’s also what computers do best.

The “structured data” part simple. It refers to readily identifiable data within any EHR, including data elements such as diagnosis codes, weight, height, and medications. This is to be distinguished from semi-unstructured data, such as speech or text, which require more advanced AI techniques such as natural language processing and deep learning.

The relationship of data elements to each other is also where the buzzword word “naive” comes in. It’s a first order assumption that the data elements are independent parameters, as would be the case, for say, age and gender in the suicide risk assessment algorithm. However, if you are using over 100 chart data elements, it’s unlikely they are all independent. Consider the relationship between a diagnosis and a medicine, which are obviously dependent parameters. You can imagine how Bayesian calculations would become much more complicated if you try to account for dependent data element pairs, such a diagnosis of depression and the medication paroxetine.

OK, finally the term “atheoretical.”

This is the biggie! This is what makes most AI approaches in healthcare fundamentally different.

An atheoretical approach means we are not using any particular theory to select relevant data elements.

Put another way, atheoretical AI techniques have an output we are interested in preventing, such as suicide attempts, but no apriori theory for which elements are causal for suicide.

Instead, the algorithm evaluates a diagnosis code of depression the same way it looks at a diagnosis code for bronchitis or the measurement of height, but then weights each differently based on the information gathered from hundreds of thousands of patients charts, most who didn’t commit suicide, but some who did.

Bottom line: Atheoretical AI doesn’t need a theory about the contributing factors, it just needs a whole bunch of data.

OK, so we know our buzzwords, and we will soon have a tool which is statistically effective at predicting suicide risk 3.5 years out from an attempt — certainly better than any current tools.

Now what?

Well, here is when things get tricky.

For example, who do we apply the algorithm to? Do we run it only on people for whom we have a concern about suicidal intent? Well, that’s going to miss a lot of people.

Or instead, do we run it on everyone who comes in to your office? Let’s face it, its hard enough explaining to people why their URI doesn’t need an antibiotic; it’s going to be much harder when you add, “oh, by the way, it appears you are at high risk for suicide in the next three years.”

These are some of the very tough questions AI within out EHR is going to bring.

So, for those of us who have been waiting for the EHR miracle ( you know — the one that says EHR will lower costs and make all our patients be more healthy), it may be just around the corner thanks to AI; The Bayesian algorithms are here, and the more advanced algorithms are just a little further out.

But don’t kid yourself; the technical “how” of AI integration within your EHR is going to be the least challenging thing.

The much harder part is going to be to figure out the who, when, and why of AI.

Dr. Matthew Rehrl is a physician who has served in a C-Suite advisory role on social media within healthcare for over a decade. His current focus is the ethics of AI in healthcare. He reports no conflict of interest.

He can be found on and @matthewrehrl.

All opinions published on Op-Med are the author’s and do not reflect the official position of Doximity or its editors. Op-Med is a safe space for free expression and diverse perspectives. For more information, or to submit your own opinion, please see our submission guidelines or email

More from Op-Med