(BigCAT) Spark Downsampler

Command-line tool for downsampling label data (stored as LabelMultisetTypes in an N5 dataset) with Spark.

Compile

Because this repository uses a branch of BigCAT that is not currently merged into master, to compile from source you will first have to check out that branch (which requires the latest version of N5, which also needs to be compiled).

Clone N5 to any location
Use Maven to install N5 1.1.4-SNAPSHOT into your local repository

mvn clean install

Then, clone shrucis1/bigcat to a location of your choice.
Switch to the n5cacheloader branch with the necessary changes.

git checkout n5cacheloader

Use Maven to install this branch of BigCAT into your local repository

mvn clean install

Finally, clone this repository, and it should compile.
To make a "fat jar" with all dependencies added, run:

mvn clean compile assembly:single

Downsampling

Downsampling works by merging groups of LabelMultisetEntryList's:

For instance, in a block of pixels where each list has one entry, with a count of 1 and a label of either 83 or 64, they will be downsampled into a single list with 2 entries, one of which will have a label of 83 and the corresponding count, and the other will have a label of 64 and its corresponding count.

The motivation for this method of downsampling is that it doesn't technically lose information other than where within a downsampled block each label is.

When converting to a color for visualization purposes, this method also allows a weighted color to be calculated from the colors of each label within a downsampled pixel.

Usage

java -Dspark.master=[spark_master] -jar target/bigcat-spark-downsampler-0.0.1-SNAPSHOT-jar-with-dependencies.jar [args]

Arguments

--compression, -c Compression type to use in output N5 dataset. Default: RAW
--factor, -f Factor by which to downscale the input image (Required)
--idatasetname, --idata, -id Input dataset name (N5 relative path from group) (Required)
--igroupname, --igroup, -ig Input group name (N5 group) (Required)
--odatasetname, --odata, -od Output dataset name (N5 relative path from group) (Required)
--ogroupname, --ogroup, -og Output group name (N5 group). Defaults to input group name
--parallelblocks, -pb Size of the blocks (in cells) to parallelize with Spark. Defaults to [16, 16, ... 16]

Note that the spark.master property must be set when running as well. See here for more information on Spark Master URLs.

Example

java -Dspark.master=local[*] -jar target/bigcat-spark-downsampler-0.0.1-SNAPSHOT-jar-with-dependencies.jar -ig ~/cremi-n5/ -id sampleA-fullres -od sampleA-8x8x2 -f 8,8,2 -c GZIP -pb 4,4,4

Would downsample the N5 label dataset at ~/cremi-n5/sampleA-fulres by a factor of 8x8x2, and write to an N5 dataset (with GZIP compression) at ~/cremi-n5/sampleA-8x8x2.

Note that the output group name is not specified, and defaults to the same as the input group name.

Also, spark.master is set to local[*], which, according to Spark documentation, will

Run Spark locally with as many worker threads as logical cores on your machine.

Parallel block size simply determines the size of each block to parallelize with, 4x4x4 yields blocks of 64 cells each. This will have no impact on the output dataset, but if it is set too high relative to the input dataset, there may not be enough blocks for each worker thread to have something to work on (thus wasting the parallelism).

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
img		img
src/main/java/bdv/bigcat/spark		src/main/java/bdv/bigcat/spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(BigCAT) Spark Downsampler

Compile

Downsampling

Usage

Arguments

Example

About

Releases

Packages

Languages

License

nthistle/bigcat-spark-downsampler

Folders and files

Latest commit

History

Repository files navigation

(BigCAT) Spark Downsampler

Compile

Downsampling

Usage

Arguments

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages