|
| 1 | +# Oxylabs AI Studio Python SDK Agentic Code Guide |
| 2 | + |
| 3 | +## Installation |
| 4 | + |
| 5 | +```bash |
| 6 | +pip install oxylabs-ai-studio |
| 7 | +``` |
| 8 | + |
| 9 | +## Best Practices for Implementation |
| 10 | + |
| 11 | +- Install latest version of oxylabs-ai-studio. |
| 12 | +- Incorporate Rate Limiting: Ensure your implementation respects rate limits associated with your |
| 13 | + purchased plan to prevent service disruptions or overuse. |
| 14 | +- Implement a Robust Retry Mechanism: Introduce a retry logic for handling failed requests, but |
| 15 | + include a limit to the number of retries to avoid infinite loops or excessive API calls. |
| 16 | + |
| 17 | +## Browser-Agent app |
| 18 | + |
| 19 | +### What It Is Good For |
| 20 | + |
| 21 | +A browser automation tool capable of controlling a browser to perform actions such as |
| 22 | +clicking, scrolling, and navigation. The tool takes a textual prompt as input to execute |
| 23 | +these actions. |
| 24 | + |
| 25 | +### Python interface |
| 26 | + |
| 27 | +#### Sync interface |
| 28 | + |
| 29 | +```python |
| 30 | +from oxylabs_ai_studio.apps.browser_agent import BrowserAgent |
| 31 | + |
| 32 | +browser_agent = BrowserAgent(api_key="<API_KEY>") |
| 33 | + |
| 34 | +prompt = "Find if there is game 'super mario odyssey' in the store." |
| 35 | +url = "https://sandbox.oxylabs.io/" |
| 36 | +result = browser_agent.run( |
| 37 | + url=url, |
| 38 | + user_prompt=prompt, |
| 39 | + output_format="json", |
| 40 | + schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []}, |
| 41 | +) |
| 42 | +print(result.data) |
| 43 | +``` |
| 44 | + |
| 45 | +#### Async interface |
| 46 | + |
| 47 | +```python |
| 48 | +import asyncio |
| 49 | +from oxylabs_ai_studio.apps.browser_agent import BrowserAgent |
| 50 | + |
| 51 | +browser_agent = BrowserAgent(api_key="<API_KEY>") |
| 52 | + |
| 53 | +async def main(): |
| 54 | + prompt = "Find if there is game 'super mario odyssey' in the store." |
| 55 | + url = "https://sandbox.oxylabs.io/" |
| 56 | + result = await browser_agent.run_async( |
| 57 | + url=url, |
| 58 | + user_prompt=prompt, |
| 59 | + output_format="json", |
| 60 | + schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []}, |
| 61 | + ) |
| 62 | + print(result.data) |
| 63 | + |
| 64 | +if __name__ == "__main__": |
| 65 | + asyncio.run(main()) |
| 66 | +``` |
| 67 | + |
| 68 | +Parameters: |
| 69 | + |
| 70 | +- url (str): Target URL to scrape (required). |
| 71 | +- user_prompt (str): User prompt to perform browser actions. Mention task or actions instead of what you like to extract from it. (required). |
| 72 | +- output_format (Literal["json", "markdown"]): Output format (default: "markdown"). |
| 73 | +- schema (dict | None): OpenAPI schema for structured extraction (required if output_format is "json"). |
| 74 | + |
| 75 | +Output (result): |
| 76 | + |
| 77 | +- Python classes: |
| 78 | + |
| 79 | + ```python |
| 80 | + class DataModel(BaseModel): |
| 81 | + type: Literal["json", "markdown", "html", "screenshot"] |
| 82 | + content: dict[str, Any] | str | None |
| 83 | + |
| 84 | + class BrowserAgentJob(BaseModel): |
| 85 | + run_id: str |
| 86 | + message: str | None = None |
| 87 | + data: DataModel | None = None |
| 88 | + ``` |
| 89 | + |
| 90 | +## AI-Scraper app |
| 91 | + |
| 92 | +### What It Is Good For |
| 93 | + |
| 94 | +A tool designed to scrape website content and return it either as Markdown or structured JSON. |
| 95 | +When opting for JSON output, the user must provide a valid JSON schema for the expected structure. |
| 96 | + |
| 97 | +### Python interface |
| 98 | + |
| 99 | +#### Sync interface |
| 100 | + |
| 101 | +```python |
| 102 | +from oxylabs_ai_studio.apps.ai_scraper import AiScraper |
| 103 | + |
| 104 | +scraper = AiScraper(api_key="<API_KEY>") |
| 105 | + |
| 106 | +url = "https://sandbox.oxylabs.io/products/3" |
| 107 | +result = scraper.scrape( |
| 108 | + url=url, |
| 109 | + output_format="json", |
| 110 | + schema={"type": "object", "properties": {"price": {"type": "string"}}, "required": []}, |
| 111 | + render_javascript=False, |
| 112 | +) |
| 113 | +print(result) |
| 114 | +``` |
| 115 | + |
| 116 | +#### Async interface |
| 117 | + |
| 118 | +```python |
| 119 | +import asyncio |
| 120 | +from oxylabs_ai_studio.apps.ai_scraper import AiScraper |
| 121 | + |
| 122 | +scraper = AiScraper(api_key="<API_KEY>") |
| 123 | + |
| 124 | +async def main(): |
| 125 | + url = "https://sandbox.oxylabs.io/products/3" |
| 126 | + result = await scraper.scrape_async( |
| 127 | + url=url, |
| 128 | + output_format="json", |
| 129 | + schema={"type": "object", "properties": {"price": {"type": "string"}}, "required": []}, |
| 130 | + render_javascript=False, |
| 131 | + ) |
| 132 | + print(result) |
| 133 | + |
| 134 | +if __name__ == "__main__": |
| 135 | + asyncio.run(main()) |
| 136 | +``` |
| 137 | + |
| 138 | +Parameters: |
| 139 | + |
| 140 | +- url (str): Target URL to scrape (required) |
| 141 | +- output_format (Literal["json", "markdown"]): Output format (default: "markdown") |
| 142 | +- schema (dict | None): OpenAPI schema for structured extraction (required if output_format is "json") |
| 143 | +- render_javascript (bool): Render JavaScript (default: False) |
| 144 | +- geo_location (str): proxy location in ISO2 format. |
| 145 | + |
| 146 | +Output (result): |
| 147 | + |
| 148 | +- Python classes: |
| 149 | + |
| 150 | + ```python |
| 151 | + class AiScraperJob(BaseModel): |
| 152 | + run_id: str |
| 153 | + message: str | None = None |
| 154 | + data: str | dict | None |
| 155 | + ``` |
| 156 | + |
| 157 | + If output_format is "json", data will be a dictionary. |
| 158 | + If output_format is "markdown", data will be a string. |
| 159 | + |
| 160 | + |
| 161 | +## Use Cases Examples |
| 162 | + |
| 163 | +### E-commerce Product Scraping |
| 164 | + |
| 165 | +- Task: Locate the category page of a specific domain, extract all product data from the category, and gather detailed information from each product page. |
| 166 | +- Proposed Workflow: |
| 167 | + - Use the Browser-Agent app to identify the category page URL and all pagination URLs within that category in a single action. |
| 168 | + Define a JSON schema to return the pagination URLs. Example: |
| 169 | + ```json |
| 170 | + { |
| 171 | + "type": "object", |
| 172 | + "properties": { |
| 173 | + "paginationUrls": { |
| 174 | + "type": "array", |
| 175 | + "description": "Return all URLs from first to last page in category pagination. If you noticed there are missing URLs, because category page does not list them all, create them to match existing ones.", |
| 176 | + "items": { |
| 177 | + "type": "string" |
| 178 | + } |
| 179 | + } |
| 180 | + }, |
| 181 | + "required": [] |
| 182 | + } |
| 183 | + ``` |
| 184 | + - Use the Ai-Scraper app to extract all product URLs from the pagination pages in the category. |
| 185 | + - Use the Ai-Scraper app again to extract detailed data from each product page by defining an appropriate JSON schema. |
0 commit comments