Skip to content

Latest commit

 

History

History
70 lines (59 loc) · 3.88 KB

README.md

File metadata and controls

70 lines (59 loc) · 3.88 KB

4CDownloader

Tool for downloading files from 4chan.org. Avoids duplicate downloads (optional) by using the MD5 hashes provided by 4chan.

Download

Download the source code and extract it into a folder where you want the downloaded files to be stored. Then start a shell in that directory and execute:

> pip install -r requirements.txt

This will install all required 3rd-party packages.

Usage

Start the script using:

> fourcdl-runner.py watch

It will watch your clipboard for copied 4chan thread urls. Once a thread url is copied you need to supply a folder name to which the thread will be downloaded to. You'll also be asked if you want to only download unique files (don't worry you can still download duplicates later).

> fourcdl-runner.py watch
Watching clipboard for 4chan thread urls...
14:15:54 - INFO - Viewing thread "B/W Miniseries" No. 3603904. OP:
[Removed output of OP message]
Input the folder name the thread is going to be downloaded to (e.g. "gif_cute", subfolders work too "gif_model/Emily Rudd"):
> p_bw-photos
Only copy unique files? y/n:
> y
Watching clipboard for 4chan file urls...
Copy cmds are: rename_thread, reset_filename, remove_file !

Now you can either press Ctrl+C and answer y to download the whole thread or only unique files if that was previously selected.

Or you can copy link addresses of 4chan files in this thread (e.g. use RMB then e on a thumbnail in Chrome). The original filename and and the MD5 hash will be printed. Now you're in append mode and copied text that isn't a 4chan file URL will be appended to the download file name of the file.

14:22:00 - INFO - Found file url of file: "p/1579558670830.jpg" Total of 1 files
Orig-fn: 2020-01-20-0008 | MD5: IzisM5jU8PGq+PDZ7n1Wkw==
Not a file URL -> clipboard was appended to filename:
1579558670830_Fomapan 200_ shot on Canonet QL17_ developed in Cinestill Monobath

If the file was already downloaded before the following message will appear:

Files with matching md5s:
  p_bw-photos\1579558670830_Fomapan 200_ shot on Canonet QL17_ developed in Cinestill Monobath.jpg
14:33:16 - INFO - ALERT!! File with url p/1579558670830.jpg has been downloaded before!
    Copy add_anyway to add file to downloads!

You can decide to move on to the next file or copy add_anyway to add the file to the download list.

There are three special commands that when copied trigger an action:

  • rename_thread: You will be aske to input a new thread folder name.
  • reset_filename: File name will be reset to the file name on the 4chan server.
  • remove_file: Last copied file will be removed from download list.

Once you want to stop adding files for this thread press Ctrl+C once then you can copy the URL of another thread url or press Ctrl+C once again to start downloading.

Stopped watching clipboard for 4chan file URLs!
Stopped watching clipboard for 4chan thread URLs!
14:30:40 - INFO - Downloading thread No. 3603904: "B/W Miniseries"
[...]
14:30:42 - INFO - CRC-Check successful!
14:30:42 - INFO - Writing thread export file "p_bw-photos_2020-01-23.txt"
14:30:42 - INFO - Appending md5s!

The script automatically verifies downloaded files and saves their MD5 hashes and names in a file with the name 4chan_dl.md5 in the root directory for convenient use with e.g. md5sum. Additionally the hashes and file names are saved in a file named downloaded_files_info.pickle for internal use. So don't delete these files!

Before starting the download the file auto-backup.json is created containing the program state so you can resume from where you left off should the download unexpectedly crash. On crashing a file named crash-exp.json is additionally written for the same reason.

To resume after a crash start the script with fourcdl-runner.py resume should there be no crash-exp.json file and the script crashed while downloading you can use fourcdl-runner.py resume auto-backup.json.