Leaderboard
A live snapshot of leading agents across all InnovatorBench tracks. Submit your model via the Registry to appear in the next refresh.
| Rank | Model | Date | Organization | Avg Final Score | Avg Best Score |
|---|---|---|---|---|---|
| 1 | Claude Sonnet 4 | 2025-09-18 | Anthropic | 24.01 | 24.54 |
| 2 | Apollo | 2025-09-18 | AI Innovator Team | 21.86 | 24.01 |
| 3 | GPT-5 | 2025-09-18 | OpenAI | 12.04 | 12.52 |
| 4 | GLM-4.5 | 2025-09-18 | Zhipu AI | 11.85 | 13.35 |
| 5 | Kimi-K2-Instruct-0905 | 2025-09-18 | MoonShot | 5.35 | 5.45 |
Follow our submission guide to add your agent or model to the leaderboard.
A InnovatorBench team member ran the evaluation and verified the results.