Skip to content

Mirror one S3 bucket to another S3 bucket.

Notifications You must be signed in to change notification settings

winzig/s3s3mirror

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

s3s3mirror

A utility for mirroring content from one S3 bucket to another.

Designed to be lightning-fast and highly concurrent, with modest CPU and memory requirements.

An object will be copied if and only if at least one of the following holds true:

  • The object does not exist in the destination bucket.
  • The size or ETag of the object in the destination bucket are different from the size/ETag in the source bucket.

When copying, the source metadata and ACL lists are also copied to the destination object.

Motivation

I started with "s3cmd sync" but found that with buckets containing many thousands of objects, it was incredibly slow to start and consumed massive amounts of memory. So I designed s3s3mirror to start copying immediately with an intelligently chosen "chunk size" and to operate in a highly-threaded, streaming fashion, so memory requirements are much lower.

Running with 100 threads, I found the gating factor to be how fast I could list items from the source bucket (!?!) Which makes me wonder if there is any way to do this faster. I'm sure there must be, but this is pretty damn fast.

AWS Credentials

  • s3s3mirror will first look for credentials in your system environment. If variables named AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are defined, then these will be used.
  • Next, it checks for a ~/.s3cfg file (which you might have for using s3cmd). If present, the access key and secret key are read from there.
  • If neither of the above is found, it will error out and refuse to run.

System Requirements

  • Java 6
  • Maven 3

Building

mvn package

Usage

s3s3mirror.sh [options] <source-bucket> <destination-bucket>

Options

-c (--ctime) N           : Only copy objects whose Last-Modified date is younger than this many days
-m (--max-connections) N : Maximum number of connections to S3 (default 100)
-n (--dry-run)           : Do not actually do anything, but show what would be done (default false)
-r (--max-retries) N     : Maximum number of retries for S3 requests (default is 5)
-p (--prefix) VAL        : Only copy objects whose keys start with this prefix
-d (--dest-prefix) VAL   : Destination prefix (replacing the one specified in --prefix, if any)
-t (--max-threads) N     : Maximum number of threads (default is same as --max-connections)
-v (--verbose)           : Verbose output (default false)

Examples

Copy everything from a bucket named "source" to another bucket named "dest"

s3s3mirror.sh source dest

Copy everything from "source" to "dest", but only copy objects created within the past week

s3s3mirror.sh -c 7 source dest

Copy everything from "source/foo" to "dest/bar"

s3s3mirror.sh -p foo -d bar source dest

Copy within a single bucket -- copy everything from "source/foo" to "source/bar"

s3s3mirror.sh -p foo -d bar source source

BAD IDEA: If copying within a single bucket, do not put the destination below the source

s3s3mirror.sh -p foo -d foo/subfolder source source

This is likely to cause infinite recursion and send your AWS bill into the stratosphere!

About

Mirror one S3 bucket to another S3 bucket.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 98.6%
  • Shell 1.4%