Glossary ›
Benchmark.
a standard test used to compare models (e.g.
a standard test used to compare models (e.g. SWE-bench for coding). Useful but gameable; a model topping a benchmark is a lagging indicator, not proof it’s best for your job.
Updated 2026-06-03