Amazon Product Scraper

A Python tool to scrape Amazon product data while bypassing anti-bot measures and Cloudflare protection.

Features

Bypasses Amazon's anti-bot systems
Handles Cloudflare protection
Rotates user agents
Supports proxy rotation
Uses multiple request methods (cloudscraper, tls_client, undetected_chromedriver)
Extracts product details and search results
Handles cookies for session persistence
Includes retry mechanisms with exponential backoff
Command-line interface for easy usage

Installation

Prerequisites

Python 3.7+
Chrome browser (for browser automation)

Required Packages

pip install requests beautifulsoup4 cloudscraper fake-useragent undetected-chromedriver selenium tls-client

Usage

Command Line Interface

Search for products:

python amazon_scraper.py search --query "mechanical keyboard" --country "com" --output results.json --pretty

Get product details:

python amazon_scraper.py product --asin "B08JCQCPN6" --country "com" --output product.json --pretty

Python API

from amazon_scraper import AmazonScraper

# Initialize scraper
scraper = AmazonScraper(
    country="com",           # Amazon.com (US)
    use_browser=True,        # Use browser automation
    headless=True,           # Run browser in headless mode
    use_proxies=False,       # Don't use proxies
    max_retries=3            # Maximum retry attempts
)

try:
    # Search for products
    products = scraper.search_products(
        query="mechanical keyboard",
        page=1
    )
    
    # Get product details
    details = scraper.get_product_details("B08JCQCPN6")
    
finally:
    # Always close the scraper to clean up resources
    scraper.close()

See amazon_scraper_example.py for more detailed examples.

Command Line Options

General Options

action: Either search or product
-c, --country: Amazon country domain (e.g., "com", "co.uk")
-o, --output: Output file path (JSON format)
--pretty: Pretty-print JSON output
-v, --verbose: Enable verbose logging

Search Options

-q, --query: Search query for product search
-p, --page: Page number for search results
-d, --department: Department to search in

Product Options

-a, --asin: Amazon ASIN (product ID) for product details

Browser Options

--no-browser: Do not use browser automation
--no-headless: Do not run browser in headless mode (show browser window)

Proxy Options

--use-proxies: Use proxies for requests
--proxy-file: Path to file containing proxy URLs (one per line)

Other Options

--cookie-file: Path to cookie file (for loading/saving cookies)
--retries: Maximum number of retry attempts
--delay: Delay between retries in seconds

Important Notes

This tool is for educational purposes only
Use responsibly and respect Amazon's terms of service
Using proxies is recommended for large-scale scraping
Amazon may block your IP if you make too many requests
Consider adding delays between requests to avoid detection

Troubleshooting

Common Issues

CAPTCHA Challenges: If you encounter frequent CAPTCHA challenges, try:
- Using proxies
- Reducing request frequency
- Using browser automation mode
IP Blocking: If your IP gets blocked:
- Use a different IP or proxy
- Wait before making more requests
- Try using a VPN

Import Errors: Make sure all required packages are installed:

pip install requests beautifulsoup4 cloudscraper fake-useragent undetected-chromedriver selenium tls-client

Browser Automation Issues: If browser automation fails:
- Make sure Chrome is installed
- Update Chrome to the latest version
- Try running without headless mode for debugging

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amazon_scraper.py		amazon_scraper.py
amazon_scraper_example.py		amazon_scraper_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon Product Scraper

Features

Installation

Prerequisites

Required Packages

Usage

Command Line Interface

Python API

Command Line Options

General Options

Search Options

Product Options

Browser Options

Proxy Options

Other Options

Important Notes

Troubleshooting

Common Issues

About

Uh oh!

Releases

Packages

Languages

License

rashidul738/Amazon_Product_Scraper

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Scraper

Features

Installation

Prerequisites

Required Packages

Usage

Command Line Interface

Python API

Command Line Options

General Options

Search Options

Product Options

Browser Options

Proxy Options

Other Options

Important Notes

Troubleshooting

Common Issues

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages