Scale Labs
[PAPERS][BLOG][LEADERBOARDS][SHOWDOWN]

[SHOWDOWN]

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use -- not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.

Methodology & Technical ReportCompare Models
Prompts compared0

Real conversation prompts compared across models through pairwise votes.

Active users0

From 80+ countries and 70+ languages, spanning all backgrounds and professions.

Copyright 2026 Scale Inc. All rights reserved.

TermsPrivacy

Leaderboard - LLMs

* This model's API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs. Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

Style Control
1

gemini-3-pro-preview

gemini-3-pro-preview
2,349
1053.05-14.45 +10.28
-14.45 +10.28
1

gemini-3-flash

gemini-3-flash
2,295
1048.03-11.50 +11.00
-11.50 +11.00
3

qwen3-omni

qwen3-omni
899
1000.00-14.31 +14.38
-14.31 +14.38
3

gpt-4o-audio-preview-2025-06-03

gpt-4o-audio-preview-2025-06-03
2,755
998.25-8.16 +8.86
-8.16 +8.86
5

voxtral-small-24b-2507

voxtral-small-24b-2507
867
922.64-11.59 +15.34
-11.59 +15.34
6

gemma3n

gemma3n
979
897.44-16.24 +11.05
-16.24 +11.05
7

gpt-realtime

gpt-realtime
2,845
856.61-11.41 +8.30
-11.41 +8.30
8

phi-4-multimodal-instruct

phi-4-multimodal-instruct
765
732.49-18.95 +20.46
-18.95 +20.46

Overall — category filter applies to rankings only

Voice Model Performance Comparison

Win Rate vs. Each Model

Win Rate vs. Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution