This Python script efficiently scrapes product information from Amazon India based on user input.
- Python 3.7+
- uv (recommended) or pip
-
Clone and navigate to the repository:
git clone https://github.com/CLoaKY233/WebScraper.git cd Scrape_with_Python
-
Install uv (if not already installed):
pip install uv
-
Create and activate a virtual environment:
uv venv source .venv/bin/activate # On Unix or MacOS .venv\Scripts\activate # On Windows
-
Install dependencies:
uv pip install -r requirements.txt
-
Run the script:
python scraper.py
-
Follow the prompts to enter the product name and number of pages to scrape.
-
The script will save the results in a CSV file named after the search term.
- User-friendly command-line interface
- Efficient HTML parsing with BeautifulSoup
- CSV output for easy data analysis
For a comprehensive breakdown of the code and web scraping concepts, see Scrape_with_Python/info.md.
- Respect Amazon's robots.txt and terms of service.
- Use responsibly to avoid IP blocking.
- Consider implementing delays between requests for ethical scraping.
This high-performance Rust program concurrently scrapes product information from Amazon India using asynchronous programming.
- Rust (latest stable version)
- Cargo (Rust's package manager, typically installed with Rust)
-
Install Rust:
- Visit https://www.rust-lang.org/tools/install
- Follow the instructions for your operating system
-
Clone and navigate to the repository:
git clone https://github.com/CLoaKY233/WebScraper.git cd Scrape_with_Rust
-
Build the project:
cargo build --release
-
Run the program:
cargo run --release
-
Follow the prompts to:
- Enter the product name
- Specify the number of pages to scrape
- Choose whether to print output to console
-
The program will save results in a CSV file named after the search term.
- Concurrent scraping with Tokio for optimal performance
- User-friendly command-line interface
- CSV output for seamless data analysis
- Built-in rate limiting to respect server resources
For an in-depth analysis of the Rust implementation and its performance benefits, see Scrape_with_Rust/info.md.
- Adhere to Amazon's robots.txt and terms of service.
- Use responsibly to prevent IP blocking.
- The program includes built-in delays to avoid overwhelming the server.
The Rust version typically outperforms the Python version, especially for larger scraping tasks, due to its concurrent design and compiled nature. For specific benchmarks, refer to the detailed explanation document.