A computer-use agent built on a multi-agent architecture, running in a Docker container with VNC access.
- Python 3.13+
- uv package manager
- xdotool (for mouse/keyboard control)
- wmctrl (for window management)
- X11 display server
- scrot
sudo apt-get install xdotool wmctrl- Clone the repository
- Install dependencies:
uv sync- Set up environment variables:
cp .env.example .env
# Edit .env with your InternLM API key- Install playwright dependencies
Run the agent with a task description:
uv run main.py "Your task description here"--api-key: InternLM API key (or set INTERNLM_API_KEY env var)--base-url: InternLM API base URL (or set INTERNLM_BASE_URL env var)--max-iterations: Maximum number of iterations (default: 20)
uv run main.py "Open Firefox and navigate to github.com" --max-iterations 10The agent follows a feedback loop:
- Gather Context: Collects messages, actions, observations, screenshots, and active windows
- Generate Action: Uses InternLM API to generate executable Python code calling tools
- Execute: Runs the generated code
- Verify: Compares before/after screenshots to verify success
- Repeat: Continues until task completion or exit
tools.click(x, y): Click at relative coordinates (0-1 range)tools.type(text): Type texttools.hotkey(keys): Press hotkey combination (e.g., 'super+r')tools.bash(cmd): Execute bash commandtools.exit(message, exitCode): Exit the agent loop
docker build -t kyros-agent .Note: First build may take 10-15 minutes due to desktop environment installation.
docker-compose up -ddocker run -d -p 5901:5901 --name kyros-agent kyros-agent- Install a VNC client (e.g., TigerVNC, RealVNC, TightVNC Viewer)
- Connect to:
localhost:5901 - Password:
password
docker exec -it kyros-agent bashEdit the Dockerfile and modify the password in the CMD line, then rebuild.
The agent-workspace directory is mounted into the container at /home/dockeruser/workspace for persistent storage.
docker-compose down
# or
docker stop kyros-agent && docker rm kyros-agentMIT
