Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers

RLVR training verifier with self-synthetic critique data to improve accuracy and honesty in test-time scaling.

✨Installation

pip install -e ./verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .

📃Prepare Data

dowload from 'yangzhch6/Mirror-Critique'

huggingface-cli download yangzhch6/Mirror-Critique --local-dir ./data

We have already provided the trajectories (with redundancy filtering) of the training procudure of Zero-RL Solver in ./data/to_critique/Qwen2.5-Math-{1.5/7}B-L-openr1-f3/to_critique.parquet. Also, the test-time output of Zero-RL Solver are shown in ./data/rlvr-critique/Qwen2.5-Math-{1.5/7}B-L-openr1-f3/test_n16_full.parquet

🔧Train Mirror-Verifier

Gen Critique

bash ./experiments/gen_critique/Qwen2.5-7B-Instruct-to-Qwen2.5-Math-{1.5/7}B.sh

SFT Cold Start

bash ./experiments/sft_critique/Qwern2.5-Math-{1.5/7}B-L.sh

RVLR Train Verifier

bash ./experiments/rlvr-verify/Qwen2.5-Math-{1.5/7}B-L-sft-ckpt-balance-bsz1k.sh

Evaluation

The performance of test-time scaling can be evaluated with:

python ./test-time-eval.py

Huggingface Models

Model	Huggingface	Base Model

Zero-Solver-Qwen2.5-Math-1.5B-L	https://huggingface.co/yangzhch6/Zero-Solver-Qwen2.5-Math-1.5B-L	Qwen2.5-Math-1.5B
Zero-Solver-Qwen2.5-Math-7B-L	https://huggingface.co/yangzhch6/Zero-Solver-Qwen2.5-Math-7B-L	Qwen2.5-Math-7B

Mirror-Verifier-1.5B	https://huggingface.co/yangzhch6/Mirror-Verifier-1.5B	Qwen2.5-Math-1.5B
Mirror-Verifier-7B	https://huggingface.co/yangzhch6/Mirror-Verifier-7B	Qwen2.5-Math-7B

🌻Acknowledgement

This repo builds upon veRL and deepscaler, and utilizes vLLM for inference. We utilize Math-Verify for math reasoning evaluation. We thank the open-source community for datasets and backbones, OpenR1-Math-220k, Qwen2.5-Math, and DeepSeek-R1 model.

📬 Contact

For questions, feedback, or collaboration opportunities, feel free to reach out:

Zhicheng Yang: yangzhch6@gmail.com

Citation

If you find our model or code useful, please kindly cite our paper:

@misc{yang2025critiqueverifyaccuratehonest,
      title={Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers}, 
      author={Zhicheng Yang and Zhijiang Guo and Yinya Huang and Yongxin Wang and Yiwei Wang and Xiaodan Liang and Jing Tang},
      year={2025},
      eprint={2509.23152},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.23152}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
deepscaler		deepscaler
exp_scripts		exp_scripts
experiments		experiments
figs		figs
format_data		format_data
verl		verl
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
test-time-eval.py		test-time-eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers

✨Installation

📃Prepare Data

🔧Train Mirror-Verifier

Gen Critique

SFT Cold Start

RVLR Train Verifier

Evaluation

Huggingface Models

🌻Acknowledgement

📬 Contact

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

yangzhch6/Mirror-Critique

Folders and files

Latest commit

History

Repository files navigation

Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers

✨Installation

📃Prepare Data

🔧Train Mirror-Verifier

Gen Critique

SFT Cold Start

RVLR Train Verifier

Evaluation

Huggingface Models

🌻Acknowledgement

📬 Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages