SWE-Polybench Results

Overview

This document summarizes the results of running SWE-Polybench. It includes overall statistics, pass rates broken down by programming language and task category, and file retrieval metrics.

Total resolved: 129/382
Total pass rate: 33.77%

Pass rate by language:
Python: 41/113 (36.3%)
JavaScript: 30/100 (30.0%)
TypeScript: 35/100 (35.0%)
Java: 23/69 (33.3%)

Pass rate by task category:
Bug Fix: 112/299 (37.5%)
Feature: 13/70 (18.6%)
Refactoring: 4/13 (30.8%)

File retrieval metrics by language:
            file_recall  file_precision  file_f1
language                                        
Java               0.63            0.68     0.60
JavaScript         0.58            0.62     0.55
Python             0.69            0.68     0.65
TypeScript         0.55            0.69     0.55

File retrieval metrics overall:
file_recall       0.61
file_precision    0.66
file_f1           0.59

Key Results (Summary)

Total resolved: 129 / 382
Total pass rate: 33.77%

Pass Rate by Language

Python: 41 / 113 (36.3%)
JavaScript: 30 / 100 (30.0%)
TypeScript: 35 / 100 (35.0%)
Java: 23 / 69 (33.3%)

Pass Rate by Task Category

Bug Fix: 112 / 299 (37.5%)
Feature: 13 / 70 (18.6%)
Refactoring: 4 / 13 (30.8%)

File Retrieval Metrics (by language)

Language	file_recall	file_precision	file_f1
Java	0.63	0.68	0.60
JavaScript	0.58	0.62	0.55
Python	0.69	0.68	0.65
TypeScript	0.55	0.69	0.55

File Retrieval Metrics (overall)

file_recall: 0.61
file_precision: 0.66
file_f1: 0.59

Conclusions & Highlights

Overall pass rate is approximately 33.77%.
Python achieves the highest file retrieval F1 (0.65) and strongest recall (0.69).
TypeScript maintains a high pass rate (35.0%).
Feature tasks are the most challenging, with the lowest pass rate (18.6%).
File retrieval precision is generally higher than recall, indicating retrieved files tend to be relevant but some targets are missed.

Contact

For more details, check the repository test configuration and logs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
swe-polybench-verified		swe-polybench-verified
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SWE-Polybench Results

Overview

Key Results (Summary)

Pass Rate by Language

Pass Rate by Task Category

File Retrieval Metrics (by language)

File Retrieval Metrics (overall)

Conclusions & Highlights

Contact

About

Uh oh!

Releases

Packages

EuniAI/Polybench-experiment

Folders and files

Latest commit

History

Repository files navigation

SWE-Polybench Results

Overview

Key Results (Summary)

Pass Rate by Language

Pass Rate by Task Category

File Retrieval Metrics (by language)

File Retrieval Metrics (overall)

Conclusions & Highlights

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages