Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Unlike exam-style benchmarks, MENTAT captures the real ambiguities psychiatrists face daily across 5 critical tasks: diagnosis, treatment, monitoring, triage & documentation. Each question has five answer options for which we remove all non-decision-relevant demographic information of patients to allow for detailed studies of how patient demographic information (age, gender, ethnicity, nationality, …) impacts model performance.

The questions in the triage and documentation categories are designed to be ambiguous to reflect the challenges and nuances of these tasks, for which we collect annotations and create a preference dataset with a hierarchical Bradley-Terry model to enable more nuanced analysis with soft labels and uncertainties.

We find that
- models show significant bias based on patient demographics (gender, ethnicity, age)
- high multiple-choice accuracy ≠ consistent free-form responses
- even top models struggle with ambiguous real-world scenarios

MENTAT was created by nine practicing psychiatrists without LLM assistance with expert-annotated questions designed to expose fairness issues that only surface at scale.

Check out the paper and code here: https://openreview.net/forum?id=tSy7OtONsg

Cite the paper [APA]:

Lamparth, M., Grabb, D., Franks, A., Gershan, S., Kunstman, K. N., Lulla, A., Roots, M. D., Sharma, M., Shrivastava, A., Vasan, N., & Waickman, C. (2026). Moving beyond medical exams: A clinician-annotated fairness dataset of real-world tasks and ambiguity in mental healthcare. In Proceedings of the Fourteenth International Conference on Learning Representations (ICLR 2026).

About Hoover

Fellows

Research

Commentary

Support Hoover

What is MyHoover?

Forgot Password

What is MyHoover?

Support Hoover

Make a Gift

Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Join the Hoover Institution’s community of supporters in ideas advancing freedom.

About Hoover

Fellows

Research

Commentary

What is MyHoover?

Log In to MyHoover

Forgot Password

What is MyHoover?

Log In to MyHoover

OR

Forgot Password

Support Hoover

Make a Gift

Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Join the Hoover Institution’s community of supporters in ideas advancing freedom.