This is a simple Snakemake wrapper around the Arima Genomics Capture Hi-C (CHiC) workflow. This wrapper allows the Arima workflow to be executed in parallel and with a reproducible Conda environment. Software versions used by this wrapper versus the ones mentioned in the Arima workflow differ, since the Arima versions are quite old and require manual installation. Software versions were chosen to be compatible with the ones validated by Arima.
- Install the Miniconda software distribution at a convenient location.
- Add the Bioconda software channel, as described.
- Install Snakemake, e.g. in a new conda environment:
conda create -n snakemake snakemake
- Check out the Arima workflow repository:
git clone https://github.com/ArimaGenomics/CHiC.git arima-chic
- In the
arima-chic
directory above, uncompress thechicagoTools.tar.gz
file:tar xvf chicagoTools.tar.gz
, resulting in aarima-chic/chicagoTools
directory. - Check out the
snake-chic
repository:git clone https://github.com/insilicoconsulting/snake-chic snake-chic
The workflow expects paired-end FASTQ files in the directory fastq
, relative to the snake-chic
directory containing the workflow.
The files must be named in the format samplename_[R1|R2].fastq.gz
, e.g. sample1_R1.fastq.gz
and sample1_R2.fastq.gz
.
It's probably easiest to create this naming format using symbolic links, e.g. ln -s /datadir/sample1_S1_L001_R1_001.fastq.gz fastq/sample1_R1.fastq.gz
.
- Adapt the parameters in
config/config.yaml
to the requirements. Adapt thearima_dir
parameter to the location of thearima-chic
workflow directory above, and thechicago_dir
parameter to the location of thechicagoTools
directory above. The other parameters are explained in the Arima repository README file. - Adapt the sample metadata file
config/samples.tsv
to use the sample names corresponding to the fastq files, and the suitable capture BED file for the sample. - Activate the
snakemake
Conda environment:conda activate snakemake
- Execute the workflow:
snakemake --use-conda -p --cores 16