"papertrail-archives" archive file downloader #58

troy · 2015-11-05T02:21:05Z

Right now, there's no way to bulk-download many archive files at once. The current method is a shell script. This is to create a new papertrail-archives CLI program which downloads either the newest N archive files or all archives before/after/between date(s)/time(s).

Papertrail has an API endpoint at /api/v1/archives.json which returns this response. It uses the same authentication as the other API endpoint.

See bin/papertrail-add-group and lib/papertrail/cli_add_group.rb for examples how to structure a new standalone command and supporting library. This will be bin/papertrail-archives and something like lib/papertrail/cli_archives.rb.

Arguments

It requires 1 of these 3 arguments:

--newest N
--min-time MIN
--max-time MAX

Examples:

papertrail-archives --newest 15
papertrail-archives --min-time '2015-01-15 00:00:00' 
papertrail-archives --min-time '2015-01-15 00:00:00' --max-time '2015-01-19'

Date parsing

Use the same behavior as the min-time and max-time arguments to the main papertrail command. That is, call parse_time on the CLI input and then pass those return values as API query params.

The client doesn't need to do any time comparisons nor anything beyond what's already in the codebase.

API client

This will entail a new lib/archives.rb for doing the API query (very similar to lib/search_query.rb), and probably a new lib/archive.rb model (very similar to lib/event.rb).

Note: The server doesn't currently honor min_time and max_time query parameters. Those will be added before this is released.

Behavior

If newest is given, hit the API target with no params and download the first n elements/files in the response array
If min-time and/or max-time are given, pass min_time and/or max_time to server and download all resulting elements/files. The the server will handle the time constraints.
If newest and either of the other args are given, or no arguments are given, refuse to run

In the API response, the URL to download a given archive file is in the download href hypermedia URL - example.

Downloads

Files may be large. I propose adding something like this as a new HttpClient.download method, since it only uses Net::HTTP but writes the files gradually.

Writing files

Write to current directory using the filename API response attribute. The filename values are unique.
Output a line each time a file completes. We'll figure out what it says later, but it'll probably just be the datestamp.

Notes

All of the download URLs use at least 1 redirect

The text was updated successfully, but these errors were encountered:

troy changed the title ~~Implement archive downloader~~ "papertrail-archives" archive file downloader Nov 5, 2015

zakwilson mentioned this issue Nov 12, 2015

Archives #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"papertrail-archives" archive file downloader #58

"papertrail-archives" archive file downloader #58

troy commented Nov 5, 2015

"papertrail-archives" archive file downloader #58

"papertrail-archives" archive file downloader #58

Comments

troy commented Nov 5, 2015

Arguments

Date parsing

API client

Behavior

Downloads

Writing files

Notes