Over the past few months, I’ve been working on a little side project that requires me to look at the websites of every medical school in the country. There was one particular point of similarity with some schools that was equal parts intriguing and suspicious. Schools that required applicants to submit CASPer scores use the exact same description of CASPer and why they choose to require it:

CASPer is an online, video-scenario based test which assesses for non-cognitive skills and interpersonal characteristics that we believe are important for success in our program and will complement the other tools that we use for applicant selection.

A surprising amount of schools use this exact verbiage: Baylor, Temple, Howard, Michigan State, and Wisconsin. I could keep going through the list, but it seems like 75% of schools that require CASPer use this exact verbiage. I find this strange because it seems like some type of affirmative defense. We all understand why the MCAT is used for med school admissions and the schools don’t feel the need to justify why it is used: it tests all of the content typically required of medical school prerequisite courses in a standardized way. CASPer is a bit different though.

Altus describes CASPer as an effective screening tool for “people skills” backed by more than 12 years of academic research. The exam consists of word and video scenarios and test-takers must write an open-ended response to a series of questions after the scenario is presented. The responses are graded by human graders, and a percentile-based score is sent to every school the test-taker designates. The exam costs $12 to take, and $12 to send the results to each school that requires CASPer results. Altus started out marketing CASPer to medical schools but has expanded its market to all manner of healthcare-adjacent graduate programs.

The published evidence does not support any of Altus’s claims about CASPer. The test lacks transparency about its scoring or content. Based on published evidence, CASPer may be biased against certain groups. CASPer should not be incorporated into the medical school admissions process, and should be considered the standardized testing equivalent of homeopathy. Let’s look at why.

The rules are made up and the points don’t matter

Altus wants adcoms (admissions committees) to see CASPer in the same light as the MCAT – as a selection factor for admission. Historically the MCAT has consistently been one of the most important selection factors for medical school adcoms because adcoms can simply look at the score and understand how well an applicant knows the prerequisite content areas. That’s ultimately meant to be the appeal of CASPer as well – a simple score they can make a decision about, only for soft skills. But while the design of the MCAT allows adcoms to do this, the design of CASPer doesn’t.

While Altus doesn’t market CASPer as a standardized test, it’s a standardized test. The scores of a standardized test reflect how well someone performs against a standard. To understand what the scores mean, you have to understand the standard for the test. The AAMC publishes a book-length document about what is on the MCAT: the format of the exam, nature of questions, and what areas are being assessed. They specifically define what skills like “Data-Based Statistical Reasoning” means and how it might be assessed in its content sections. Without being able to contextualize what the score means from looking at the standard in this way, the MCAT score would be meaningless.

CASPer provides no documented standard for how the skills are defined and assessed in the exam. On CASPer’s website they claim to assess for collaboration, communication, empathy, equity, ethics, motivation, problem solving, professionalism, resilience, and self-awareness. You actually have to define what “professionalism” and “problem solving” mean in the context of the exam, and explain how these will be assessed on the exam. Altus provides no documentation for how the open-ended responses are scored other than spelling not being a part of the score. As a result, test-takers have no idea how to structure their responses, or what constitutes a good response. But Altus markets this issue as one of CASPer’s killer features.

Altus states in its FAQ that no preparation is required:

The general literature suggests that situational judgment tests (SJTs), like the CASPer test, are relatively immune to test preparation (i.e. that coaching is unlikely to provide benefit).

This is simply untrue. While there are many papers claiming to be able to minimize coaching effects in situational judgement tests, most of these papers focus on the effects of coaching in multiple choice or other discrete answer style situational judgement tests. Their findings do not necessarily apply to CASPer. Furthermore, many of these papers have serious financial conflicts of interest. There are papers like this one that state that situational judgement tests are very prone to coaching. If test-takers knew how the test was being scored, the test would probably be a joke.

There are multiple companies offering CASPer test prep courses. Altus claims “We have heard students directly recommending to other students that they don’t think it helped them.” I have no idea how a company that effectively boasts ‘our test is a scientific way to assess what a good person you are’ can justify anything with: ‘dude, trust me’. Altus does not provide scores to test-takers, with the exception of a few schools participating in their revolutionary program where they actually send scores back to test-takers. However, that doesn’t mean that companies couldn’t get the scores by simply asking someone with access to them. The only reason there hasn’t been a flood of CASPer prep products coming out is because it’s unclear how adcoms actually use these scores. Nobody knows if it’s worth the trouble to figure it out that much. If test-takers have no idea what advantage they get from scoring highly on the exam, they don’t have any real incentive to prepare for it. As soon as it becomes clear how much weight adcoms place on CASPer scores, the test’s entire scoring system will be leaked or reverse-engineered and strategies will be developed for prep courses and shared online.

The marketing research

There’s very little research on CASPer, and there’s no evidence that CASPer is effective at identifying people more likely to succeed in medical school. Of the research that does exist, they only correlate CASPer scores with secondary endpoints or simulated outcomes: not actual medical school admissions outcomes. The only paper that attempts to show that CASPer scores correlate with success in medical school is a 2016 study published in Advances in Health Sciences Education titled “CASPer, an online pre-interview screen for personal/professional characteristics: prediction of national licensure scores“. In the study, the authors administered the CASPer video exam to Canadian medical school applicants and followed the board scores of those who were accepted. The only real correlation was moderate between CASPer scores and scores on the Considerations of Legal, Ethical and Organizational Aspects of the Practice of Medicine (CLEO) section of the MCCQE Part II. The authors conclude:

CASPer is able to predict for personal/professional characteristics with similar magnitude that GPA and MCAT predict for cognitive outcomes.

Then invalidate their own conclusion with:

Additionally, CASPer demonstrated no correlation with cognitive outcomes and measures of medical expertise.

The CLEO of the MCCQE, in addition to all other sections of the MCCQE, is an assessment of medical knowledge, clinical skills and knowledge of professional behaviors and attitudes at a level expected of a physician in independent practice in Canada. The CLEO standards defines professional characteristics and behaviors and tests for them. Professionalism, medical ethics, and principles of health law are part of the body of knowledge that medical students are expected to learn and know for the CLEO exam as part of their education. CLEO scores are not a reflection of “personal attributes”. CASPer does not measure professional characteristics because professionalism is an individualized body of knowledge defined by each profession. While there might be a mathematical correlation, that doesn’t mean it logically follows. You should also be relieved to know that most students score very well on the CLEO section of the MCCQE exam.

The other major study looking at CASPer was published in Academic Medicine titled “Addressing the Diversity–Validity Dilemma Using Situational Judgment Tests“. This study also features the two co-creators of CASPer as authors. The authors collected the data of all applicants to the New York Medical College School of Medicine including: GPA, Multiple Mini-Interview Score (MMI, used to assess nonacademic competencies), MCAT, and CASPer score. CASPer was not used in decision-making at any point in the admissions process. The authors then performed simulations using applicant CASPer, MMI, GPA, and MCAT scores, giving different weights to each to predict how many applicants of different demographics would be admitted as those metrics were changed. They found that increasing the weight of CASPer scores in the simulations increased the number of female, and racial and ethnic minority, applicants who are invited to interview.

This study is fundamentally flawed both in its methods and its conclusions. Simulations can be incredibly powerful tools in predicting future outcomes, but they’re only useful if realistic assumptions are made – and in this case the weights used were laughably unrealistic. The four simulations included:

  • MCAT/GPA only
  • One-third weight given to CASPer
  • One-half weight given to CASPer
  • Only CASPer

When CASPer score was given 1/2 weight in comparison to the MCAT and GPA, it only resulted in three additional African-American, and seven Hispanic, applicants being invited for an interview. The problem is that adcoms would never give that amount of weight to CASPer scores, as evidenced by published data.

GPA and MCAT are of the highest importance to adcoms when deciding who to invite for an interview. The assumption that CASPer scores would be given half weight against GPA and MCAT is just absurd. It is well established in the literature that lower GPAs and MCAT scores from black applicants are due to economic status. Anyone from a lower-income background is likely to spend less money on MCAT prep products like third-party practice tests, review books, or entire preparation courses, and may also be working part- or full-time, resulting in less time to study for classes and the MCAT. Naturally, this tends to lower scores, and adcoms account for this when looking at applications from traditionally underrepresented groups.

At the time this paper was written, likely there were no preparation products for CASPer and little reason to purchase any such products if they existed in the first place. As a result, scores across groups would likely not reflect the same sort of racial and economic bias as other standardized testing scores. The CASPer exam can be prepared for, there is little incentive to do so based on the (lack of) weight the admission process puts on it. The reason that CASPer doesn’t display the same sort of racial and economic biases in outcomes is that the circumstances that lead to that bias do not yet exist. Further, all of this is based on the assumption that the test isn’t biased against certain groups in the first place.

The second study on CASPer shows that the exam is biased against men based upon the data. Men made up 54% of the applicants in the study. In the simulation where CASPer was not considered at all, a near-equal amount of men and women were invited for an interview. As soon as CASPer was included in the simulation the gender split included more women: in the one-third CASPer weight condition, 55% of applicants invited to an interview were women, rising to 64% when only CASPer was considered (predictably, with a 50% CASPer weight the result is 58%). Simply using GPA and MCAT with no consideration of CASPer scores resulted in a simulated interview pool whose gender breakdown was closest to the applicant pool and society.

It’s unclear how much this gender bias in scoring could affect actual admissions outcomes because nobody knows how CASPer is weighted in the admissions process. It’s also unclear if CASPer exhibits racial and ethnic bias because no such study has been done. To summarize, the use of CASPer wouldn’t likely benefit minority applicants under realistic conditions. This study also doesn’t show that the CASPer exam isn’t biased against racial and ethnic minorities. The primary demographic, if any, that would benefit from CASPer being weighted a realistic amount by admissions committees would likely be women – who have made up the majority of medical school applicants and graduates since 2017.

Conflicts of interest in CASPer and MMI research

There are significant issues with the conflicts of interest disclosed in the research on CASPer. These conflicts of interest are so apparent it makes it incredibly difficult to take the conclusions of these studies seriously even if I were to ignore their unacceptable methodological flaws. The first two authors in the Advances in Health Sciences Education article were Kelly Dore and Harold I. Reiter, who are the creators of the CASPer exam and shareholders in Altus Assessments, the private company that administers and profits from CASPer (Dore is also the vice president of the company). The fourth author, Geoffrey Norman, is the founding editor of the journal that published this article. This is the only article that could possibly be cited as evidence that CASPer scores can be used to predict professional behavior, which it didn’t show in the slightest. Every article pertaining to CASPer being used for med school admissions currently features the co-creators of CASPer as authors.

In the Academic Medicine article, the same conflicts of interest exist. However, there is a statement in the discussion section that is really intriguing and I think warrants a closer look:

The differences in performance between UIMs and “traditional” applicants tended to be smaller or reversed with assessments of nonacademic competency (CASPer, MMI) compared with academic assessments. For example, when comparing low SES relative to no economic disadvantage, African Americans relative to whites, and Hispanics/Latinos relative to non-Hispanics/non-Latinos, CASPer demonstrated smaller but still evident group effect sizes, whereas the MMI demonstrated the absence of any significant subgroup differences with African Americans. However, this beneficial result has not been found with MMI use at other schools, so caution must be taken in the extrapolation of these MMI results.

This seems like a subtle implication that some schools that have implemented multiple mini-interviews have done it differently than other schools might be biased. Potentially. Maybe. Who knows? However, I feel like it might push the readers to look into what a MMI structure would lead to minority groups not being disadvantaged. Perhaps an interested reader might find structured MMI programs that fit that criteria available for purchase at no small expense. This may include an “off-the-shelf” MMI interview program from a company called ProfitHR or ProspectHR; of which Harold Reiter is a shareholder. The ProspectHR site states that they have consulted with over 75 universities that have a lot more name recognition than Slippery Rock. Harold Reiter was also the primary author of the landmark study talking about the concept of a multiple mini-interview. Reiter doesn’t disclose this in the conflicts of interest section in this Academic Medicine, despite this being an obvious material conflict of interest.

To be fair, Reiter discloses this fact on other papers that directly discuss multiple mini-interviews. However, there is a reason that we have conflicts of interest disclosure sections. If positive results from a study would benefit your personal business, your positive results should be viewed with more skepticism. The only studies that I can find looking at CASPer and actual admissions outcomes were authored by the two creators of CASPer. You can produce bad papers, and you can have serious conflicts of interest, but if you publish papers so fundamentally flawed with results that would only encourage people to purchase your product, I think there should be consequences for that. The consequence should be that I’m not going to listen to anything you have to say about that research topic ever again. I don’t think anyone else should either.

Conclusion: Many qualified applicants

Graduate school admissions are competitive, and no graduate admissions process is more competitive than medical school. Med school admissions have been growing more competitive with increasing GPAs and MCAT scores of matriculants. It’s not uncommon for medical schools to receive more than 100 applications for every seat. Medical school admissions are frequently adding new prerequisite courses or increasing their minimum requirements to reduce the number of applications. It makes sense that admissions committees would want a fast way to identify the applicants who are compassionate, ethical, and wouldn’t go on to become the next Mark Hyman. Sure, it would be nice if there were a test that could tell you how virtuous a person is, but no such test exists.

CASPer is the multivitamin of med school admissions criteria. Little to no evidence suggests that using it will actually provide any long-term benefits. It sounds like it might be a fair way to select applicants for their soft skills, and it’s also easy to add as a requirement so what’s the harm? But like multivitamins, admissions committees are being sold the idea of an objective tool that will increase the fairness of the admissions process – not the reality. There is nothing objective or fair about a computerized test that uses human graders for open-ended responses, doesn’t outline how responses are evaluated, and doesn’t provide feedback. If anything, CASPer is the least objective or fair metric ever introduced to the admissions process. Apparently test-takers are just supposed to be born with the knowledge of what profiteering medical school faculty would do in response to an ethical dilemma.

To medical school admissions committee members who might be reading this: I’m not telling you how to do your jobs. I’m just saying if you want physicians who are compassionate and empathetic, I think you’re working against your own goals by requiring more things like CASPer and VITA. How many times can premed students be asked to spend their time and money on some type of performance for medical school admissions before it starts to affect them as a person? There are dozens of articles in medical journals discussing “premed culture” and not in a good way. A 2019 letter to the editor in Academic Medicine said this about the effects of premed culture:

This generation of students has been trained to clear obstacles, but they have not been taught to think, to be skeptics of knowledge, or to learn, not as a means but as an end.

I view CASPer as just one more obstacle that serves little more than to perpetuate this way of thinking. It’s natural to want to select the best applicants, and that becomes near impossible when you might have 10 applicants who all look identical. This problem is only going to get worse as schools are experiencing an explosion in the number of applications within the past year. We don’t need more quick metrics to compare applicants by. No arms race in history has ever been solved by starting new weapon development programs.

CASPer is the symptom of a growing problem in medical school admissions, but that’s a topic for another time. The evidence does not support Altus’s claims about the use of CASPer in medical school admissions. From a premed to admissions committees: if you’re looking for a reason to choose me over another seemingly identical applicant, please just flip a coin. I understand that you’re flooded with applications and care a lot about choosing the best applicant in a fair and unbiased manner. That’s not a sustainable solution though. How could I complain? A coin-flip is much more unbiased and transparent than CASPer.

Posted by Braden MacBeth

I'm a software engineer in Pennsylvania.