This project is designed to scrape PDF files from oic website and download them to your local machine.
- Python 3.x
pip
(Python package installer)
- Create a Python virtual environment
python -m venv env
-
Activate the virtual environment
- On Windows:
.\env\Scripts\activate
- On macOS and Linux:
source env/bin/activate
-
Install the required dependencies
pip install -r requirements.txt
- Run the spider crawler
This command will run the spider defined in scraper.py and save the output to pdf_files.json.
scrapy runspider -O pdf_files.json scraper.py
- Download the PDFs
After running the spider, you can download the PDFs using the download-pdf.py script.
python download-pdf.py