Reviewing Agents

LLM-powered scientific paper review and error detection

This repository supports:

To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis

We developed an LLM-based Paper Correctness Checker to identify objective mistakes (formulas, derivations, figures, tables) in papers published at top AI venues. Our analysis reveals that mistakes per paper have increased over time—from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (+55%). Human experts confirmed 83.2% precision on 316 reviewed mistakes. The checker can also propose correct fixes for 75.8% of identified issues.
Agents4Science: LLM reviewers for the Agents4Science Conference, the first conference where AI agents served as both primary authors and reviewers

Agents4Science 2025 was the inaugural conference where AI systems served as both authors and reviewers of research papers. Organized by TogetherAI and Stanford University, it received 315 submissions with 48 papers accepted after AI + human peer review.

Installation

uv sync

Configuration

Set up API keys in .env (copy from env.dev):

cp env.dev .env

Configure modules in config.yaml.

Modules

Module	Description
`SimpleReviewer`	General paper reviewing
`LLMCorrectnessDetector`	Methodological correctness evaluation
`LLMCriticalityVerifier`	Verifies criticality of correctness findings
`LLMFormatDetector`	Format compliance checking
`JailbreakingChecker`	Detects adversarial instructions in papers
`ReferenceCheckLight`	Reference hallucination detection
`ReferenceCheckHeavy`	Full reference + author verification
`ArxivTaxonomyClassifier`	arXiv category classification

Quick Start

Agents4Science reviewers can be used to review papers:

from reviewing_agents.modules import SimpleReviewer

pdf_bytes = open("paper.pdf", "rb").read()

reviewer = SimpleReviewer()
result = reviewer.review_paper(pdf_bytes)

The LLMCorrectnessDetector and LLMCriticalityVerifier can be used to evaluate the correctness of a paper, used in our To Err Is Human paper.

from reviewing_agents.modules import LLMCorrectnessDetector, LLMCriticalityVerifier

pdf_bytes = open("paper.pdf", "rb").read()

detector = LLMCorrectnessDetector()
correctness = detector.check_correctness(pdf_bytes)

verifier = LLMCriticalityVerifier()
findings = {"score": correctness.score, "reasoning": correctness.reasoning, "key_issues": correctness.key_issues}
verified = verifier.verify_criticality(pdf_bytes, findings)

Citation

@article{bianchi2025toerr,
  title={To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis},
  author={Bianchi, Federico and Kwon, Yongchan and Izzo, Zachary and Zhang, Linjun and Zou, James},
  journal={arXiv preprint arXiv:2512.05925},
  year={2025}
}

@article{bianchi2025agents4science,
  title={Exploring the use of AI authors and reviewers at Agents4Science},
  author={Bianchi, Federico and Queen, Owen and Thakkar, Nitya and Sun, Eric and Zou, James},
  journal={arXiv preprint arXiv:2511.15534},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
assets		assets
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
env.dev		env.dev
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reviewing Agents

Installation

Configuration

Modules

Quick Start

Citation

About

Uh oh!

Releases

Packages

Languages

License

togethercomputer/reviewing-agents

Folders and files

Latest commit

History

Repository files navigation

Reviewing Agents

Installation

Configuration

Modules

Quick Start

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages