Skip to content

This project offers two powerful web scraping tools for Amazon India: a Python script and a high-performance Rust program. Both allow users to easily extract product information based on search terms. The Python version is user-friendly and ideal for quick tasks, while the Rust implementation excels in speed and efficiency for larger scraping jobs.

License

Notifications You must be signed in to change notification settings

CLoaKY233/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Product Scraper (Python)

This Python script efficiently scrapes product information from Amazon India based on user input.

Prerequisites

  • Python 3.7+
  • uv (recommended) or pip

Installation

  1. Clone and navigate to the repository:

    git clone https://github.com/CLoaKY233/WebScraper.git
    cd Scrape_with_Python
    
  2. Install uv (if not already installed):

    pip install uv
    
  3. Create and activate a virtual environment:

    uv venv
    source .venv/bin/activate  # On Unix or MacOS
    .venv\Scripts\activate  # On Windows
    
  4. Install dependencies:

    uv pip install -r requirements.txt
    

Usage

  1. Run the script:

    python scraper.py
    
  2. Follow the prompts to enter the product name and number of pages to scrape.

  3. The script will save the results in a CSV file named after the search term.

Features

  • User-friendly command-line interface
  • Efficient HTML parsing with BeautifulSoup
  • CSV output for easy data analysis

Detailed Explanation

For a comprehensive breakdown of the code and web scraping concepts, see Scrape_with_Python/info.md.

Ethical Considerations

  • Respect Amazon's robots.txt and terms of service.
  • Use responsibly to avoid IP blocking.
  • Consider implementing delays between requests for ethical scraping.

Amazon Product Scraper (Rust)

This high-performance Rust program concurrently scrapes product information from Amazon India using asynchronous programming.

Prerequisites

  • Rust (latest stable version)
  • Cargo (Rust's package manager, typically installed with Rust)

Installation

  1. Install Rust:

  2. Clone and navigate to the repository:

    git clone https://github.com/CLoaKY233/WebScraper.git
    cd Scrape_with_Rust
    
  3. Build the project:

    cargo build --release
    

Usage

  1. Run the program:

    cargo run --release
    
  2. Follow the prompts to:

    • Enter the product name
    • Specify the number of pages to scrape
    • Choose whether to print output to console
  3. The program will save results in a CSV file named after the search term.

Features

  • Concurrent scraping with Tokio for optimal performance
  • User-friendly command-line interface
  • CSV output for seamless data analysis
  • Built-in rate limiting to respect server resources

Detailed Explanation

For an in-depth analysis of the Rust implementation and its performance benefits, see Scrape_with_Rust/info.md.

Ethical Considerations

  • Adhere to Amazon's robots.txt and terms of service.
  • Use responsibly to prevent IP blocking.
  • The program includes built-in delays to avoid overwhelming the server.

Performance Comparison

The Rust version typically outperforms the Python version, especially for larger scraping tasks, due to its concurrent design and compiled nature. For specific benchmarks, refer to the detailed explanation document.

About

This project offers two powerful web scraping tools for Amazon India: a Python script and a high-performance Rust program. Both allow users to easily extract product information based on search terms. The Python version is user-friendly and ideal for quick tasks, while the Rust implementation excels in speed and efficiency for larger scraping jobs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published