SWE-bench
Definition
A software engineering benchmark that measures whether a model can fix real GitHub issues
#SWE-bench#SWE-Bench#SWE-bench Verified#SWE-Bench Pro#coding benchmark
What is SWE-bench?
SWE-bench evaluates whether a model can resolve real issues from open-source repositories. Instead of abstract coding quizzes, it tests repository understanding, patch generation, and test execution success.
How is it measured?
A model receives issue context, generates a patch, and is scored by whether the patch passes the associated tests. This makes SWE-bench closer to practical software maintenance than syntax-only evaluation.
Why does it matter?
In production coding workflows, "looks correct" is not enough. Teams need fixes that actually run and pass tests. SWE-bench helps compare that capability.
Is your site visible in AI search?
See for free how ChatGPT, Perplexity, and Gemini describe your brand.
Start Free Diagnosis →Related terms
Natural Language Processing
CursorBench
A coding-model benchmark Cursor runs on its own operational data
Natural Language Processing
Agentic AI
A category of AI systems that autonomously decompose goals, use tools, and run multi-step tasks
Natural Language Processing
AGI (Artificial General Intelligence)
A hypothetical AI system capable of performing any intellectual task a human can
Natural Language Processing
AI Agent
An autonomous AI system that can plan, use tools, and take actions to achieve goals
Natural Language Processing
Attention
A mechanism that allows AI models to focus on the most relevant parts of the input when producing output
Natural Language Processing
BigLaw Bench
A benchmark for legal-task performance, focusing on document interpretation and reasoning consistency