Clerkship Evaluation Scores Are Useless

You showed up early. You left late. You not only pre-rounded on your patients, you pre-pre-rounded on them. You didn’t just skim Up-To-Date for superficial knowledge of your patient’s condition, you delved into the primary literature, developing an in-depth expertise of her disease that culminated in a painstakingly meticulous presentation to your entire treatment team. At the end of your rotation, the intern bid you farewell with a tear in his eye, knowing he would never have a student so capable ever again. Your classmates worshiped you. Your patients loved you. In short, you rocked.

Several weeks following the end of your rotation, you receive an email with your clerkship evaluation. You excitedly click the link, anticipating at least a “High Pass,” but secretly hoping that your impressive performance earned you that elusive “Honors.” The page finally loads … you scroll anxiously down to view your course grade. You didn’t get Honors. You didn’t even get a High Pass. Your buddy Jake, learning of your dismay, helpfully comments, “Oh, Dr. Smith always gives out average grades. Don’t take it personally.”

Welcome to medical school clerkship evaluations. Although the above example is fictional, it portrays a very real problem for students in the current medical education system. As residency programs have become increasingly selective, stratification of students at medical schools has become more and more imperative. Board exams that used to be little more than a formality have become a screening tool used by residency directors to eliminate at least some of the hundreds of applications they receive for each position. Competitive residency programs require ever-increasing numbers of publications, preferably something related to the residency field. Clerkship preceptor evaluations sort students into Pass, High Pass, or Honors, and have become yet another marker of student evaluation.

In theory, clerkship evaluations represent an invaluable source of insight into whether a student can apply theoretical knowledge in a practical setting, act professionally on an integrated care team, and accurately document patient encounters. In practice, however, clerkship scores are often assigned arbitrarily by a broad array of evaluators with various inclinations and grading habits, randomly punishing some students and rewarding others based on the whims, leniency, or laziness of the evaluator rather than the quality of the student.

Some evaluators thoughtfully fill out end-of-clerkship evaluation forms, carefully assigning a numerical value to each case where the student performed admirably or where the student did not meet expectations. Other evaluators carelessly click through the feedback form, assigning the same score to every student, regardless of performance. This inherent variability renders such evaluations meaningless as a tool for comparison.

Unfortunately, the consequences of these evaluations are far from meaningless. In a 2016 National Resident Matching Program survey of program directors, grades in required clerkships were the fifth most commonly cited consideration in selecting applicants to interview out of a list of 32 factors—and were rated just as heavily as performance on Step 2 of the United States Medical Licensing Examination Boards. (1) Advocates of this system argue that it’s not designed to be a stand-alone measure of a student’s performance but rather should be taken as a small piece of a more comprehensive whole. This argument, however, fails to address the quality of the metric, which may be unreliable enough to actually obscure the big picture.

So what can be done? Clerkships are a vital aspect of medical education and surely students should be held accountable for their performance. If scores assigned by preceptors are unreliable, what is the alternative? Some form of standardization? More rigorous training? Discard them altogether?

Fortunately, there could be a relatively simple solution to this quandary. A 2013 review of the numerical clerkship ratings system conducted by researchers at the University of Colorado found that while the literature suggests serious problems with the validity and reliability of ratings of clinical performance based on numerical scores, the critical issue is not that judgments about what is observed vary from rater to rater but that these judgments are lost when translated into numbers on a scale. (2) In short, while quantitative measures of clerkship performance are all over the map, qualitative measures continue to paint a relatively accurate picture of a student’s performance.

I recently experienced this phenomenon myself while on my Pediatrics rotation and, as I reached out to classmates at my school and at other medical schools around the country, I quickly discovered that my experience was far from anomalous. One of my friends in the year above me recounted an experience with an attending who commented that he was “one of the best students I have ever worked with” and subsequently assigned him an exceptionally lackluster numerical grade. This student stated that: “The only way to help ensure good evaluations is to find out who tends to give them and try and get yourself on their team.” In other words, rather than accurately reflecting performance on the wards, the system only rewards those canny enough to game it.

Of course, simply abolishing numerical clerkship scores won’t completely solve the problem. Some attendings will still be meticulous in their evaluations while others will summarily copy and paste a template praising the ability of “the medical student” to be “present and participate appropriately” (yes, that is a direct quotation from a real evaluation). However, eliminating numerical grades will diminish the impact of these unavoidable distortions, allowing for a more complete and subjective assessment commensurate with the inherent subjectivity of the system. Numbers and quantitative benchmarks are easy to overemphasize and falsely offer a sense of objectivity, even when no objectivity was employed in their assignment.

Stratification is necessary in a system wherein many qualified applicants compete for limited opportunities. The measures we set to ensure stratification—test scores, research projects, letters of recommendation—all have their limitations. But generally, they allow for fair comparisons between students and allow each to be judged by the same standard. With numerical clerkship scores, this is simply not the case. When it comes to personal and subjective descriptions of performance, let’s let the evaluations speak for themselves.


  1. “Results of the 2016 NRMP Program Director Survey.” National Resident Matching Program, June 2016,
  2. Hanson, Janice L., et al. “Narrative Descriptions Should Replace Grades and Numerical Ratings for Clinical Performance in Medical Education in the United States.” Current Neurology and Neuroscience Reports., U.S. National Library of Medicine, 23 Nov. 2013,

Image: Luciano Lozano / gettyimages

Ian Christensen is a third-year medical student at the University of Utah School of Medicine in Salt Lake City.

More from Op-Med