MediaURLScraper is a Python script that allows you to easily scrape media URLs from webpages. Whether you're looking to extract video URLs, audio URLs, or source URLs, this script has got you covered!
- Scrapes media URLs from webpages
- Supports extraction of video URLs, audio URLs, and source URLs
- Uses Python and the Selenium library for web scraping and automation
- Python 3.x installed
- Selenium library installed (use
pip install seleniumto install) - Validators library installed (use
pip install validatorsto install) - Chrome WebDriver installed
Note: This script utilizes the Selenium library with the Chrome WebDriver to automate web interactions. The Chrome WebDriver is a separate executable that allows Selenium to control the Chrome browser. It is necessary to have the Chrome WebDriver installed on your system.
To install the Chrome WebDriver, follow these steps:
-Check your Chrome browser version by navigating to chrome://settings/help in the address bar. -Visit the official Chrome WebDriver downloads page: Chrome WebDriver Download Link -Download the appropriate Chrome WebDriver version that matches your Chrome browser version. -Extract the downloaded file and place the WebDriver executable (chromedriver) in a directory accessible to your Python script. -Ensure that the Chrome WebDriver is properly installed and available in your system's PATH so that the script can locate and use it during execution.
Please make sure to have the Chrome WebDriver installed and configured correctly before running the script.
-
Clone or download the repository to your local machine.
-
Install the required Python packages:
-
Set up the Python environment:
- Create a virtual environment (optional but recommended)
- Activate the virtual environment (refer to your platform-specific instructions)
To further simplify the setup process, I've included a setup.bat file in the repository. This batch file handles the environment setup and activation for you. -To automate the environment setup, simply double-click the setup.bat file. It will open a command prompt, navigate to the script directory, and activate the virtual environment for you.
4.To make it work, follow these steps:
-Open the script in your favorite text editor or Python IDE.
5.Modify the following parts of the code to match your setup:
-Set the correct file path to the Chrome WebDriver executable:
Service('your_chromedriver_path/chromedriver.exe')
-Adjust the desired file path for exporting the media URLs:
'your_desired_file_path/media_urls.txt'
6.Save the modified script.
-
Run the script using the following command:
python script_name.py
-
Follow the prompts to enter the URLs you want to scrape media URLs from. The script will scrape the URLs and display the results.
-
The extracted media URLs will be saved in a file named
media_urls.txtin the project directory.
This project is licensed under the MIT License.
Feel free to contribute, open issues, or submit pull requests if you have any suggestions or improvements.
Happy scraping!