- Set a flexible configuration file to be able to assemble:
- one genome
- multiple genomes
- Set the necessary adapter FASTA files depending on the technology (NextSeq or MiSeq) or allow detection from filename
- The adapter files can be downloaded from the trimmomatic repository and the PhiX genome (
NC_001422.1
) as well to enable full reproducibility - SPAdes provide an alternative flag to
--careful
that is--isolate
(introduced in 3.14.0) that could be used for high-coverage (100x) isolate genome. Note that there is no one-size-fits-all as always - Same for recycler with a
-i True
for isolate - Adjust the maximum length of the kmer required by recycler based on the SPAdes output
- Compute the numbers of contigs below 1kb and remove with seqkit
- Assess completeness and contamination with checkM, but remove the plasmid check. Available in bioconda (1.1.3)
- Annotate genome with Bakta (5S extraction)
- Extract the LSU (23S) and SSU (16S) with metaxa2
- Assess contamination with MDMcleaner
- Compute basepairs statistics and coverage with seqkit
- Compute assembly statistics with QUAST
- Generate checksums with md5 hash on the gz version of the raw reads and the final genome for deposition on Coscine
- Extract above statistics to produce a standard compliant table based on the sample table
- Include a report rule
Listed in the reverse order because it is easier for Snakemake design. The subsections could serve as building separate Snakefiles to be included.
- Generate genome FASTA file only
- Generate genome FASTA with plasmids if present (consider snakemake checkpoints for evaluation of condition)
- Assemble with spades (v3.13.1). Snakemake wrapper only for metaspades. Available in bioconda (3.15.3)
- Remove plasmid contigs from reads with bbduk included in the bbmap (v38.84). Snakemake wrapper available (38.90)
- Extract plasmid sequences with recycler (v unknowm) from de novo assembly graph and alignment. Available in bioconda (v0.7)
- BAM/SAM management with samtools (v0.1.19). Snakemake wrapper available (1.10)
- Alignement of reads on the assembly graph with bwa mem (v0.7.5). Snakemake wrapper available (0.7.17)
- Indexing of the assembly graph with bwa (v0.7.5). Snakemake wrapper available (0.7.17)
- Convert the assembly graph in FASTA with
make_fasta_from_fastg
from Recycler (0.62). Available in bioconda (0.7-3) - Plasmid reconstruction with plasmidspades (v3.13.1). Snakemake wrapper only for metaspades. Available in bioconda (3.15.3)
- Remove phiX sequences from reads with bbduk included in the bbmap (v38.84). Snakemake wrapper available
- Remove adapters and filter length with trimmomatic (v0.39). Snakemake wrapper available but older (0.36) so bioconda.