Scale Labs
[PAPERS][BLOG][LEADERBOARDS][SHOWDOWN]
BACK
AgentsSafety, Evaluation and AlignmentReasoning10/28/2025

Remote Labor Index: Measuring AI Automation of Remote Work

Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Sumana Basu, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Connor Smith, Ye Htet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernández Cardona, Annette Diamond

View paper

Scale AI and the Center for AI Safety are proud to introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real world, economically valuable remote-work tasks designed to evaluate end-to-end agent performance in practical settings. Across evaluated frontier AI agent frameworks, performance sits near the floor, with a maximum automation rate of 2.5% on RLI tasks. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking progress and enabling stakeholders to proactively navigate the onset and risks of AI-driven labor automation.

The potential for AIs to automate human labor is a topic of significant interest and concern. While AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, it remains unclear how these gains translate into real economic value and actual automation. To address this gap, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising realworld, economically valuable remote-work tasks designed to evaluate end-to-end agent performance in practical settings. Across evaluated frontier AI agent frameworks, performance sits near the floor, with a maximum automation rate of 2.5% on RLI tasks. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking progress and enabling stakeholders to proactively navigate the onset and risks of AI-driven labor automation.

Check out the leaderboard: https://scale.com/leaderboard/rli

Remote Labor Index: Measuring AI Automation of Remote Work

Copyright 2026 Scale Inc. All rights reserved.

TermsPrivacy