PropensityBench
Simulating real-world pressure to choose between safe or harmful behavior
Overview
PropensityBench is a novel benchmark framework designed to evaluate the latent safety risks in LLMs. Traditional safety evaluations primarily test a model's capabilities, leaving a critical blind spot regarding a model's propensity, or what it would do if empowered with dangerous tools alongside safe ones. Models may strategically conceal risky capabilities or harbor latent inclinations toward misuse that static audits fail to capture.
PropensityBench addresses this by shifting the focus from "can-do" to "would-do" assessments. It measures an LLM's tendency to pursue harmful actions when given the choice in a simulated, high-stakes environment. The framework spans four critical, high-risk domains:biosecurity, chemical security, cybersecurity, and self-proliferation. By applying realistic operational pressures, the benchmark reveals a model's underlying inclinations, providing a more dynamic and realistic picture of its true safety alignment.