RLVR training verifier with self-synthetic critique data to improve accuracy and honesty in test-time scaling.
pip install -e ./verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .
dowload from 'yangzhch6/Mirror-Critique'
huggingface-cli download yangzhch6/Mirror-Critique --local-dir ./data
We have already provided the trajectories (with redundancy filtering) of the training procudure of Zero-RL Solver in
./data/to_critique/Qwen2.5-Math-{1.5/7}B-L-openr1-f3/to_critique.parquet. Also, the test-time output of Zero-RL Solver are shown in./data/rlvr-critique/Qwen2.5-Math-{1.5/7}B-L-openr1-f3/test_n16_full.parquet
bash ./experiments/gen_critique/Qwen2.5-7B-Instruct-to-Qwen2.5-Math-{1.5/7}B.sh
bash ./experiments/sft_critique/Qwern2.5-Math-{1.5/7}B-L.sh
bash ./experiments/rlvr-verify/Qwen2.5-Math-{1.5/7}B-L-sft-ckpt-balance-bsz1k.sh
The performance of test-time scaling can be evaluated with:
python ./test-time-eval.py
| Model | Huggingface | Base Model |
|---|---|---|
| Zero-Solver-Qwen2.5-Math-1.5B-L | https://huggingface.co/yangzhch6/Zero-Solver-Qwen2.5-Math-1.5B-L | Qwen2.5-Math-1.5B |
| Zero-Solver-Qwen2.5-Math-7B-L | https://huggingface.co/yangzhch6/Zero-Solver-Qwen2.5-Math-7B-L | Qwen2.5-Math-7B |
| Mirror-Verifier-1.5B | https://huggingface.co/yangzhch6/Mirror-Verifier-1.5B | Qwen2.5-Math-1.5B |
| Mirror-Verifier-7B | https://huggingface.co/yangzhch6/Mirror-Verifier-7B | Qwen2.5-Math-7B |
This repo builds upon veRL and deepscaler, and utilizes vLLM for inference. We utilize Math-Verify for math reasoning evaluation. We thank the open-source community for datasets and backbones, OpenR1-Math-220k, Qwen2.5-Math, and DeepSeek-R1 model.
For questions, feedback, or collaboration opportunities, feel free to reach out:
- Zhicheng Yang: yangzhch6@gmail.com
If you find our model or code useful, please kindly cite our paper:
@misc{yang2025critiqueverifyaccuratehonest,
title={Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers},
author={Zhicheng Yang and Zhijiang Guo and Yinya Huang and Yongxin Wang and Yiwei Wang and Xiaodan Liang and Jing Tang},
year={2025},
eprint={2509.23152},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.23152},
}



