Setting Up WebDriver:
chrome_options = Options(): Creates an instance of ChromeOptions to configure settings for the Chrome browser. chrome_options.add_argument("_tt_enable_cookie=1"): Adds a custom Chrome option to enable cookies with the "_tt_enable_cookie" parameter. driver = webdriver.Chrome(): Initializes a Chrome WebDriver, which allows the script to interact with a Chrome browser. Iterating Through Cars:
The function iterates over a list of cars specified in the cars parameter. Constructing URL:
Builds the URL for the AutoTrader search based on the criteria and the current car in the iteration. Opening the URL in Browser:
driver.get(url): Opens the constructed URL in the Chrome browser. Waiting:
time.sleep(5): Pauses the script for 5 seconds to allow the page to load and prevent potential issues with dynamic content. Page Source and BeautifulSoup:
source = driver.page_source: Retrieves the HTML source code of the current page. content = BeautifulSoup(source, "html.parser"): Creates a BeautifulSoup object to parse the HTML content. Checking for Results:
Tries to find the number of pages by searching for a specific pattern in the page content. If the pattern is not found, it prints "No results found" and moves to the next car. Pagination Loop:
Iterates through the pages of search results. Scraping Car Details:
articles = content.findAll("section", attrs={"data-testid": "trader-seller-listing"}): Finds all HTML sections containing information about individual car listings on the current page. Extracting Details:
For each article (car listing), it extracts various details like name, price, year, mileage, etc., and stores them in a dictionary (details). Handling Seller Information:
Tries to extract seller information and location. If successful, updates the details dictionary. Handling Car Specifications:
Iterates through the list of car specifications and extracts information like year, mileage, transmission, engine type, fuel type, and number of owners. Appending Details to Data List:
Appends the details dictionary to the data list. Printing Progress:
Prints messages indicating the progress of scraping, including the current car, page, and the number of articles scraped. Data Processing:
Converts the list of dictionaries (data) into a pandas DataFrame (df). Returning the DataFrame:
Returns the DataFrame containing all the scraped data. Final Output:
The main block (not shown in this code snippet) would call this function and receive the DataFrame, allowing further processing or analysis.