Claude generated files based on (bytecode):
https://gist.github.com/jlia0/db0a9695b3ca7609c9b1a08dcbf872c9
This repository implements a proxy server that allows users to interact with both terminal environments and web browsers programmatically. It offers a REST API for managing file operations, text editing, and browser actions, as well as WebSocket support for real-time terminal interaction. This setup is ideal for automation scenarios where you need to execute shell commands, interact with web pages, or manage files remotely.
- Terminal Access via WebSocket:
- Establish persistent WebSocket connections for interactive terminal sessions.
- Supports various terminal commands and operations like reset, kill process, and viewing history.
- Browser Automation via REST API:
- Control browser actions such as navigation, clicking elements, inputting text, taking screenshots, and more through a REST API.
- Provides browser status checks and restart capabilities.
- File Management API:
- Upload files to presigned URLs, including support for multipart uploads for large files.
- Download files and batch download multiple files to specified folders.
- Zip and upload entire project directories.
- Text Editor API:
- Perform text editor operations like viewing file content, creating new files, writing to files, string replacement, and content searching.
- Sandbox Environment Initialization:
- Initialize a sandbox environment by setting up secrets securely.
- Health Check Endpoint:
- Provides a
/healthzendpoint for monitoring server availability.
- Provides a
- Customizable Logging:
- Configurable logging level to suit different operational needs (debug, info, warning, error, critical).
The project is structured into two main directories that handle different aspects of the functionality: app and browser_use.
The app directory contains the core server-side logic and API definitions. It's built using FastAPI and is responsible for:
- API Routing (
app/router.py): Defines custom API route handling, including request timing and logging. - Server Logic (
app/server.py): Implements the FastAPI application, defines API endpoints, and orchestrates interactions with terminal and browser components. - WebSocket Server (
app/terminal_socket_server.py): Manages WebSocket connections for terminal sessions, handling message parsing and response sending. - Tools (
app/tools/): Contains modules for different tools:base.py: Base classes and utilities for tools.browser/: Manages browser interactions, actions, and browser management.terminal/: Manages terminal sessions, command execution, and terminal history.text_editor.py: Implements text editor functionalities.
- Helpers (
app/helpers/): Utility modules for common tasks:tool_helpers.py: Utilities for running shell commands.utils.py: General utility functions like file upload, text truncation, etc.
- Models (
app/models.py): Defines Pydantic models for request and response data structures. - Logger (
app/logger.py): Configures logging for the application. - Types (
app/types/): Defines type hints and Pydantic models for browser and message types.
Browser use is based on: https://github.com/browser-use/browser-use
However, it has been modified to use Claude API (browser_use/agent/service.py)
The browser_use directory houses the browser automation library, which is designed to be reusable and independent of the main app server. It provides:
- Agent (
browser_use/agent/): Implements the agent logic for browser automation, message management, and prompts. - Browser (
browser_use/browser/): Manages browser instances, contexts, and pages using Playwright. - Controller (
browser_use/controller/): Defines actions and action registry for browser automation. - DOM (
browser_use/dom/): Handles Document Object Model (DOM) processing and element interaction. - Telemetry (
browser_use/telemetry/): Implements telemetry collection for usage metrics. - Utils (
browser_use/utils.py): Utility functions for thebrowser_uselibrary. - Logging Configuration (
browser_use/logging_config.py): Configures logging specifically for thebrowser_uselibrary.
Relationship between app and browser_use: The app directory leverages the browser_use library for browser automation functionalities. app/server.py and app/terminal_socket_server.py act as the entry points, using the tools and libraries from both app and browser_use to provide the API and WebSocket interfaces. browser_use is designed as a modular library that app integrates with, keeping the browser automation logic separate and reusable.
Currently, app_data contains a single subdirectory:
js/: This directory specifically stores JavaScript files.
Inside the js/ directory, you can find the following files based on the provided documentation:
-
getViewport.js:- Content: Contains JavaScript code that, when executed in a browser, returns the current viewport dimensions (width and height) of the browser window.
- Usage: This script is likely used by the browser automation tools to determine the visible area of a webpage, which can be important for actions like scrolling, element visibility checks, and responsive design considerations.
-
runExtensionAction.js:- Content: Contains JavaScript code designed to interact with browser extensions. It likely provides a mechanism to send messages to browser extensions and handle responses.
- Usage: This script suggests that the browser automation is capable of interacting with browser extensions programmatically. This could be used for tasks like:
- Triggering actions within browser extensions.
- Retrieving data from browser extensions.
- Controlling extension behavior as part of an automation workflow.
-
selectOption.js:- Content: Contains JavaScript code that helps in selecting an option from a dropdown (
<select>) element on a webpage. - Usage: This script is used to programmatically interact with dropdown menus in web forms. It takes parameters (likely a CSS selector and option index) to locate and select a specific option within a dropdown, simulating user interaction with form elements.
- Content: Contains JavaScript code that helps in selecting an option from a dropdown (
The JavaScript files in app_data/js are not directly executed as part of the backend server code (Python). Instead, they are designed to be:
- Read by the Python Backend: The Python code in the
appandbrowser_usedirectories (likely withinbrowser_use/browser/context.pyandbrowser_use/dom/service.py) reads the content of these.jsfiles. - Injected into the Browser Context: The content of these JavaScript files is then injected into the browser context managed by Playwright. This is typically done using Playwright's
page.evaluate()orcontext.add_init_script()methods. - Executed in the Browser: Once injected, the JavaScript code runs within the security context of the webpage loaded in the browser. This allows the server to:
- Execute complex browser-side logic that is difficult or inefficient to perform from the server-side Python code.
- Interact directly with the DOM, browser APIs, and potentially browser extensions.
- Retrieve structured data from the webpage (like viewport dimensions or dropdown options).
Example Usage Scenario (Hypothetical):
When the server needs to get the viewport dimensions of a webpage during a browser automation task, it might:
- Read the content of
app_data/js/getViewport.js. - Use Playwright to execute this JavaScript code within the current browser page using
page.evaluate(getViewport_js_content). - Receive the JSON response from the JavaScript execution, containing the viewport width and height.
The API is structured to provide clear separation of concerns, with endpoints categorized by functionality.
I didn't actually see the header being used for the calls. However, in real world implementation an API token is used so that only valid API calls are allowed. The API key is set in: $HOME/.secrets/sandbox_api_token
data_api.py is used as a api client. The original proxy service is located at: https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi but you can set it to localhost.
In order to work with API you need to create a key (assuming there is actually authentication):
curl -X GET http://localhost:8330/healthz -H "x-sandbox-token: dummy_api_key"But I don't see the token being used anywhere in code so it's possible that it's only being used on the proxy, but that's just a guess.
GET /terminal/{terminal_id}: Retrieves the content of a specific terminal session.POST /terminal/{terminal_id}/reset: Resets a specific terminal session, clearing its history and restarting the shell.POST /terminal/reset_all: Resets all active terminal sessions.POST /terminal/{terminal_id}/kill: Kills the current process running in a specific terminal session.POST /terminal/{terminal_id}/write: Writes text input to a specific terminal session.WebSocket /terminal: Establishes a WebSocket connection for real-time, bidirectional communication with a terminal session.
GET /browser/status: Checks the status of the browser automation service, indicating if it's running or stopped.POST /browser/action: Executes a browser action. Accepts a JSON payload defining the action to be performed, such as navigation, clicking, inputting text, etc.
POST /upload_file: Uploads a single file to a pre-signed URL. Requiresfile_pathandpresigned_urlin the request body.POST /multipart_upload: Handles multipart uploads for large files, using pre-signed URLs for each part. Requiresfile_path,presigned_urls(list of pre-signed URLs for each part), andpart_sizein the request body.GET /get_file/{path:path}: Serves a file for download from the server's filesystem. The file path is specified in the URL path.POST /batch_download: Downloads multiple files from URLs to a specified folder on the server. Accepts a JSON payload with a list of files to download and an optional folder path.POST /zip_and_upload: Zips a specified directory and uploads the archive to a pre-signed URL. Requiresdirectory,upload_url, andproject_typein the request body.
POST /text_editor: Executes text editor actions. Accepts a JSON payload defining the text editor command (view,create,write,str_replace,find_content,find_file) and associated parameters likepath,file_text,old_str,new_str, etc.
POST /init_sandbox: Initializes the sandbox environment with secrets. Accepts a JSON payload containing secrets as key-value pairs.
GET /healthz: Provides a health check endpoint to verify if the server is running.
To start the server, navigate to the repository directory and run:
python start_server.pyYou can customize the server startup using command-line arguments:
--port <port>: Specify the port number for the server (default: 8330).--host <host>: Specify the host address to bind to (default:0.0.0.0for all interfaces).--log-level <level>: Set the logging level (debug,info,warning,error,critical; default:info).--chrome-path <path>: Provide a custom path to the Chrome browser executable.
Example with custom port and log level:
python start_server.py --port 8080 --log-level debugOnce the server is running, you can interact with the API endpoints using tools like curl, httpie, or any HTTP client. For WebSocket interactions, you can use tools like wscat or implement a WebSocket client in your preferred programming language.
Refer to the "API Structure" section for details on each endpoint and the required request formats.
CHROME_INSTANCE_PATH: (Optional) Specifies the path to a Chrome browser instance. If set, the server will use this instance for browser automation. If not set, the server will attempt to manage its own browser instance.RUNTIME_API_HOST: (Optional) Specifies the host URL for the internal API aggregation platform (used bydata_api.py). Defaults tohttps://api.manus.im.BROWSER_USE_LOGGING_LEVEL: (Optional) Specifies the logging level for thebrowser_uselibrary. Defaults toinfo.