Skip to content

medss19/web-scraping-with-beautiful-soup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

web-scraping-with-beautiful-soup

https://www.linkedin.com/posts/medha-agarwal-01b33725a_internship-pythonprogramming-webscraping-activity-7214991432367976448-8iL1?utm_source=share&utm_medium=member_desktop

๐——๐—ฒ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฝ๐˜๐—ถ๐—ผ๐—ป: ๐—ช๐—ฒ๐—ฏ๐˜€๐—ถ๐˜๐—ฒ ๐—ฆ๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Chose BigBasket, a website with publicly accessible product listings. ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Used the Beautiful Soup library to scrape HTML content and extract relevant information such as product titles, prices, quantities, and discounts. ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐˜๐—ผ๐—ฟ๐—ฎ๐—ด๐—ฒ: Stored the extracted data in a structured format (CSV file) for further analysis and use. ๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—ฐ๐—ฎ๐—น ๐—–๐—ต๐—ฎ๐—น๐—น๐—ฒ๐—ป๐—ด๐—ฒ๐˜€: Handled issues like dynamic content loading, ensuring accurate and complete data extraction.

๐—œ๐—บ๐—ฝ๐—น๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—›๐—ถ๐—ด๐—ต๐—น๐—ถ๐—ด๐—ต๐˜๐˜€:

  • Utilized Selenium for navigating and interacting with the dynamic website.
  • Leveraged Beautiful Soup for parsing HTML content and extracting product details.
  • Implemented a scrolling mechanism to handle infinite scrolling and ensure all products were captured.
  • Ensured data integrity by handling missing or unavailable data gracefully.

๐—–๐—ต๐—ฎ๐—น๐—น๐—ฒ๐—ป๐—ด๐—ฒ๐˜€ ๐—™๐—ฎ๐—ฐ๐—ฒ๐—ฑ:

  • Managing dynamic content loading and ensuring the scraper captures all products as the page scrolls.
  • Handling website structure changes and ensuring the scraper adapts accordingly.
  • Optimizing the scraper to efficiently process and store large amounts of data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages