Bioethics Bench is a structured evaluation corpus for testing, benchmarking, and improving AI performance in normative reasoning across clinical, research, public health, and other domains where ethical judgment matters most.
The Bench spans the full breadth of bioethics — the ethics of living systems in their clinical, research, policy, environmental, and technological dimensions.
"Bioethics Bench: a shared corpus for investigating, validating, and improving normative computation across important bioethical domains. Building the Bench properly will require careful scenario construction, expert consultation, institutional collaboration, and iterative development in line with the aim of fostering a community of use."
— Doing Ethics with AI (2026) · Ghose, Rasaee, Singer, Savulescu
Bioethics Bench is designed to benchmark normative reasoning across three evaluation conditions, enabling direct comparison of AI performance alongside human judgment.
Human experts evaluate bioethical scenarios without AI assistance, establishing a human baseline for normative reasoning performance across domains and scenario types.
Language models are evaluated directly on structured scenario-based normative tasks, measuring autonomous performance in bioethical reasoning and policy selection against the benchmark.
Practitioners using AI normative guidance are evaluated, testing whether computer-aided ethics measurably improves the quality of bioethical judgment compared to unaided performance.
Bioethics Bench is the evaluation corpus at the end of a normative computation pipeline — extending the structured outputs of SACRE evaluations into benchmarking and fine-tuning infrastructure for the research community.
Beyond any single benchmark or fine-tuning run, the Bench is built on a few design commitments that make it a durable resource for the field.
Bioethical scenarios at scale in a consistent, structured format — built for systematic study rather than isolated cases.
Each scenario weighs policy candidates from public preferences, expert judgment, and ethical frameworks — never a single viewpoint.
Every record carries its SACRE convergence scores and justification traces, so results stay inspectable rather than black-box.
A shared base for benchmarking models, comparing them against human judgment, and fine-tuning domain-adapted systems.
Each entry pairs a bioethical scenario with candidate policies — drawn from public preferences, expert judgment, and ethical frameworks — and the full SACRE evaluation that scores their normative convergence and selects the most justified position.
Illustrative example — scenario text and scores shown for demonstration, not from the released corpus.
Bioethics Bench is being developed through careful scenario construction, expert consultation, and institutional collaboration. We invite researchers, practitioners, and institutions to engage with the project.
Get notified about releases, the dataset, and ways to contribute.
Questions or collaboration: research@alethic.ai
If you use Bioethics Bench or the SACRE methodology in your research, please cite the introducing paper.
@article{ghose2026doingethics,
title = {Doing Ethics with AI: Practical Ethics Engineering,
Product-Led Philosophy, and Computer-Aided Ethics},
author = {Ghose, Sankalpa and Rasaee, Kasra and
Singer, Peter and Savulescu, Julian},
year = {2026}
}