Rehab Bench Clinical LLM Stress Tests
About

Built by a rehab clinician who got tired of vague AI answers.

Rehab Bench is part of a larger body of work around clinical AI tools for physiotherapy and rehabilitation. The point is simple: before a model gets anywhere near a real workflow, it should be forced through cases where sounding confident is not enough.

Why I made this

Rehab is full of edge cases. A patient asks for exercises, but the symptoms point to urgent referral. A model gives a neat plan, but it ignores irritability, dosage, or missing safety information. Those mistakes are easy to miss if we only judge the writing.

Rehab Bench Flash is a quick stress test for that problem. It does not prove a model is safe. It shows where it starts to break.

What I build around

My work keeps circling the same question: can AI help rehab professionals without flattening clinical judgment? That includes model evaluation, clinical rules, patient-facing education, and rehab-focused product experiments.

Ability Labs is the broader direction behind this work. Rehab Bench is one small, public piece of that larger effort.

How I want people to read the scores

The leaderboard is useful, but it is not the whole story. Open the full review. Read the actual answer. Look at the prompt score. Check the safety concerns. A model that gets a decent number can still be weak in exactly the area you care about.

That is the honest use of this benchmark: not hype, not dismissal, just a clearer look at what the model did when the case got clinical.