Skip to content
View DavidWells's full-sized avatar
😃
😃

Sponsors

@nathanchapman

Organizations

@inboundnow

Block or report DavidWells

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Scraping

21 repositories

Extracts content information from known URL patterns.

TypeScript 18 3 Updated Aug 20, 2024

Download high-resolution images from Fine Art America, Conde Nast Store, Photos.com, and Pixels.com. "the current reverse engineering approach is non-functional."

TypeScript 27 2 Updated Dec 20, 2023

A standalone version of the readability lib

JavaScript 9,454 627 Updated Feb 14, 2025

⬛️ CLI tool for saving complete web pages as a single HTML file

Rust 12,853 363 Updated Dec 2, 2024
TypeScript 1 Updated Oct 14, 2024

Pre-built Chromium binaries for AWS Lambda, compatible with Playwright and Puppeteer.

TypeScript 47 3 Updated Feb 24, 2025

Chromium (x86-64) for Serverless Platforms

TypeScript 1,148 76 Updated Feb 25, 2025

An AI web browsing framework focused on simplicity and extensibility.

TypeScript 7,994 383 Updated Feb 26, 2025

IGN news site scrapper

JavaScript 1 Updated Jun 30, 2022

Turn any website to API by several clicks (serverless and support SPA!)

JavaScript 2,163 137 Updated Jan 7, 2023

Automagically reverse-engineer REST APIs via capturing traffic

HTML 8,677 309 Updated Feb 24, 2025

Web Extension for saving a faithful copy of a complete web page in a single HTML file

JavaScript 16,935 1,070 Updated Feb 19, 2025

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Python 23,311 1,237 Updated Feb 13, 2025

Self-hosted bookmark manager that is designed be to be minimal, fast, and easy to set up using Docker.

Python 7,596 357 Updated Feb 23, 2025

A browser extension for saving web documents locally, allowing you to access them offline and quickly search for webpage content without an internet connection, while also saving browser memory usage.

TypeScript 653 25 Updated May 7, 2024

Fetch an entire site and save it as a text file (to be used with AI models).

TypeScript 1,146 71 Updated Jan 18, 2025

A lightweight RSS parser, for Node and the browser

JavaScript 1,417 213 Updated Nov 13, 2024

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

Python 27,174 3,700 Updated Feb 26, 2025

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

HTML 2,404 170 Updated Feb 23, 2025

Like jq, but for HTML.

Rust 7,227 116 Updated May 29, 2024

serverless-pdf-generator is a lightweight package that simplifies the process of generating PDFs from web pages in a serverless environment like Vercel. It utilizes Puppeteer and Chromium to render…

TypeScript 4 1 Updated Feb 19, 2025