Skip to content

This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.

License

Notifications You must be signed in to change notification settings

wh-forker/Vision-Zero

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎯 Vision-Zero

Scalable VLM Self-Improvement via Strategic Gamified Self-Play

arXiv License Models Dataset

Overview

A domain-agnostic framework enabling VLM self-improvement through competitive visual games

📋 Table of Contents


🎯 Overview

Although reinforcement learning (RL) can effectively enhance the reasoning capabilities of vision–language models (VLMs), current methods remain heavily dependent on labor-intensive datasets that require extensive manual construction and verification, leading to extremely high training costs and consequently constraining the practical deployment of VLMs.

To address this challenge, we propose Vision-Zero, a domain-agnostic framework enabling VLM self-improvement through competitive visual games generated from arbitrary image pairs.

✨ Key Features

🎮 Strategic Self-Play Framework

Vision-Zero trains VLMs in "Who Is the Spy"-style games, where the models engage in strategic reasoning and actions across multiple roles. Through interactive gameplay, models autonomously generate their training data without human annotation.

🖼️ Gameplay from Arbitrary Images

Unlike existing gamified frameworks, Vision-Zero can generate games from arbitrary images, thereby enhancing the model's reasoning ability across diverse domains and showing strong generalization to different tasks. We demonstrate this versatility using three distinct types of image datasets: CLEVR-based synthetic scenes, charts, and real-world images.

📈 Sustainable Performance Gain

We introduce Iterative Self-Play Policy Optimization (Iterative-SPO), a novel training algorithm that alternates between Self-Play and reinforcement learning with verifiable rewards (RLVR), mitigating the performance plateau often seen in self-play-only training and achieving sustained long-term improvements.

🏆 Achievement: Despite using label-free data, Vision-Zero achieves state-of-the-art performance on reasoning, chart question answering, and vision-centric understanding tasks, surpassing other annotation-based methods.

🎉 Current Release Status

Component Status Description
🤖 Models ✅ Available Trained models on Qwen2.5-VL-7B, InternVL3-8B, InternVL3-14B
📊 CLEVR Dataset ✅ Available Complete CLEVR-based training dataset
🛠️ Training Code ✅ Available Full open-source training pipeline
📈 Chart Dataset 🚧 Coming Soon Chart-based dataset for enhanced reasoning
🌍 Real-World Dataset 🚧 Coming Soon Real-world image dataset for diverse scenarios

🚀 Quick Start

# 1. Clone the repository
git clone https://github.com/your-repo/vision-zero.git
cd vision-zero

# 2. Set up environment
conda create -n vision-zero python=3.10
conda activate vision-zero
bash setup.sh

# 3. Download a Trained model
# Choose from available models in the table below

# 4. Start training or inference
bash run_scripts/run_grpo_vision_zero.sh

🤖 Models & Dataset

🔬 Trained Models

Model Family Size Dataset HuggingFace Link
Qwen2.5-VL 7B CLEVR Model
Qwen2.5-VL 7B Chart Model
Qwen2.5-VL 7B Real-World Model
InternVL3 8B CLEVR Model
InternVL3 14B CLEVR Model

📊 Datasets

Dataset Type Description Link
CLEVR-based Synthetic scenes for logical reasoning Dataset

🛠️ Setup

📢 Acknowledgment: This repo is based on vlm-r1 - thanks for their contribution!

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended)
  • Conda or similar environment manager

Installation

# Create and activate environment
conda create -n vision-zero python=3.10
conda activate vision-zero

# Install dependencies
bash setup.sh

💪 Training

📋 Training Pipeline

Step 1: 📁 Prepare Dataset and Model

Download one of the available datasets or prepare your own:

  • CLEVR-based: Available now ✅
  • Chart-based: Coming soon 🚧
  • Real-World: Coming soon 🚧

Configure your training setup in run_scripts/run_grpo_vision_zero.sh:

# Configuration variables
IMAGES_DIR=$IMAGES_DIR          # Path to your images
SCENES_DIR=$SCENES_DIR          # Path to scene descriptions  
MODEL=$MODEL                    # Base model to fine-tune
OUTPUT_BASE_DIR=$OUTPUT_DIR     # Output directory for checkpoints
RUN_NAME="your_run_name"        # Experiment name

Step 2: 🚀 Start Training

Launch the training process with customizable hyperparameters:

bash run_scripts/run_grpo_vision_zero.sh

💡 Tip: All hyperparameters can be modified directly in the script file.

Step 3: 📊 Evaluation

Evaluate your trained model on out-of-distribution tasks using VLMEvalKit:

# After training completes and checkpoint is saved
# Use VLMEvalKit for comprehensive evaluation

We use VLMEvalKit for comprehensive model evaluation on out-of-distribution tasks, ensuring robust performance assessment across various benchmarks.


📄 Citation

If you find Vision-Zero useful in your research, please consider citing our paper:

@misc{wang2025visionzeroscalablevlmselfimprovement,
    title={Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play}, 
    author={Qinsi Wang and Bo Liu and Tianyi Zhou and Jing Shi and Yueqian Lin and Yiran Chen and Hai Helen Li and Kun Wan and Wentian Zhao},
    year={2025},
    eprint={2509.25541},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2509.25541}
}

🌟 Star this repo if you find it helpful!

Made with ❤️ by the Vision-Zero team

About

This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Shell 2.7%
  • Other 0.2%