Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Kraken2/Bracken on Unaligned Sequences for Contamination Detection #1351

Closed
wants to merge 242 commits into from
Closed
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
242 commits
Select commit Hold shift + click to select a range
0620d21
stub for align_star + a single snapshot match per test
maxulysse Jul 3, 2024
573ca90
add stubs + code polish + Harshil align
maxulysse Jul 3, 2024
3efcb84
add stub
maxulysse Jul 3, 2024
345d1f8
update all nf-core modules and subworkflows and do not include tags
maxulysse Jul 3, 2024
eb08f2f
simplify snapshot for stub
maxulysse Jul 3, 2024
c956f6f
add stubs for subworkflow local quantify_rsem
maxulysse Jul 3, 2024
bf5797f
update default snapshot
maxulysse Jul 3, 2024
37780ad
add plugin
maxulysse Jul 4, 2024
150ac0b
update all subworkflows
maxulysse Jul 4, 2024
9fd07da
do not snapshot fastq in stub
maxulysse Jul 4, 2024
2cdeea1
pipeline level stub
maxulysse Jul 4, 2024
3c78a6a
fix stub test for prepare_genome
maxulysse Jul 4, 2024
c572fc0
update all subworkflows
maxulysse Jul 4, 2024
df9efba
correctly align
maxulysse Jul 5, 2024
bc7d7a5
update all modules
maxulysse Jul 9, 2024
b78a653
use seqera containers
maxulysse Jul 11, 2024
98cf1fa
Merge branch 'dev' into stubs_everywhere
maxulysse Jul 12, 2024
8759ed7
update all modules/subworkflows + remove tags
maxulysse Jul 12, 2024
5c1db07
update snapshot
maxulysse Jul 12, 2024
3c9250d
Merge branch 'dev' into stubs_everywhere
maxulysse Jul 16, 2024
8052b6b
Merge branch 'dev' into stubs_everywhere
maxulysse Jul 22, 2024
8b28fee
update nf-test to 0.9.0
maxulysse Jul 22, 2024
3eaf7a4
sort
maxulysse Jul 22, 2024
86bac63
update all modules/subworkflows + remove tags
maxulysse Jul 22, 2024
ee285c8
remove tags
maxulysse Jul 22, 2024
88286cd
keep card 3.1.1 that does not break CI due to weird RG
maxulysse Jul 22, 2024
f5adb7a
update snapshot
maxulysse Jul 22, 2024
d1c19be
update snapshot
maxulysse Jul 22, 2024
f94beb3
update all modules + subworkflows / remove tags + path markduplicates
maxulysse Jul 22, 2024
784aeb2
update snapshots
maxulysse Jul 22, 2024
30afc0e
update stringtie_stringtie
maxulysse Jul 22, 2024
58cee58
update snapshots for fastq_align_hisat2
maxulysse Jul 22, 2024
586058b
add proper stubs and tests for local star_*_igenomes modules
maxulysse Jul 22, 2024
3e0793f
Install kraken2 and bracken
egreenberg7 Jul 30, 2024
a7281e5
Change nextflow configurations
egreenberg7 Jul 30, 2024
734919e
Initial addition of kraken and bracken to pipeline
egreenberg7 Jul 30, 2024
8695f50
Fix include statements
egreenberg7 Jul 31, 2024
44ba282
Fix include statements
egreenberg7 Jul 31, 2024
3657e3e
Include module config files
egreenberg7 Aug 5, 2024
7ab90f6
Use presence of kraken_db to determine bracken run, add save params
egreenberg7 Aug 5, 2024
39cfdef
Fix multiqc for Bracken/Kraken
egreenberg7 Aug 5, 2024
2e9c243
Add Bracken/Kraken citations
egreenberg7 Aug 5, 2024
03314c9
Fix bracken config
egreenberg7 Aug 6, 2024
7913cbc
Update bracken
egreenberg7 Aug 7, 2024
063b4ac
Adjust multiqc configs for updated bracken
egreenberg7 Aug 7, 2024
075bca8
Add input validation and update scheme
egreenberg7 Aug 7, 2024
a0a225c
Change default Bracken precision to species
egreenberg7 Aug 7, 2024
bd4ea1c
Debug input validation
egreenberg7 Aug 7, 2024
7789827
Documentation/output image
egreenberg7 Aug 7, 2024
336a7b0
Update changelog
egreenberg7 Aug 7, 2024
5c0adce
Try multiqc fixes
pinin4fjords Aug 7, 2024
97a4530
Fix rename patterns
pinin4fjords Aug 8, 2024
25f3628
Import tsv func from nf-core subworkflow
pinin4fjords Aug 8, 2024
5663692
Better define ordering
pinin4fjords Aug 8, 2024
7dc68d6
Add new header, define some sections, reinstate trimming status repor…
pinin4fjords Aug 8, 2024
a1e1d99
Update Kraken2 module
egreenberg7 Aug 9, 2024
33c9483
Linting
egreenberg7 Aug 9, 2024
56a8d77
Fix up methods and versions passed to MultiQC
pinin4fjords Aug 9, 2024
be5825c
Attempt a rational fixed ordering of sections
pinin4fjords Aug 9, 2024
f87b645
Temp fix pending suworkflow update
pinin4fjords Aug 9, 2024
e5c0207
update subworkflow from nf-core
pinin4fjords Aug 9, 2024
d858577
update changelog
pinin4fjords Aug 9, 2024
102152e
Change to --contaminant_screening param
egreenberg7 Aug 15, 2024
6dd7521
Fixing save unaligned default
egreenberg7 Aug 15, 2024
fdd85ad
Debugging
egreenberg7 Aug 15, 2024
c0099d4
Update usage
egreenberg7 Aug 15, 2024
c25aa50
Provide motivation for Kraken2 parameters
egreenberg7 Aug 15, 2024
32a6440
Merge branch 'dev' into stubs_everywhere
maxulysse Aug 16, 2024
f77e714
proper sha for module bedtools/genomecov
maxulysse Aug 16, 2024
200d4bd
update samtools/stats module + rm tags
maxulysse Aug 16, 2024
addb735
Merge branch 'dev' into harmonize_new_for_multiqc
pinin4fjords Aug 16, 2024
0063c2d
Fix typo
egreenberg7 Aug 16, 2024
53f46b2
Update metro map
egreenberg7 Aug 16, 2024
2dc5746
Merge branch 'dev' into dev
Shaun-Regenbaum Aug 18, 2024
2a322c4
Update schema
egreenberg7 Aug 19, 2024
75a10f7
Update docs/usage.md
egreenberg7 Aug 19, 2024
cdbfcda
Update multiqc for prefix usage
pinin4fjords Aug 19, 2024
cfc8945
Change output directory for kraken2/bracken
egreenberg7 Aug 19, 2024
23717f3
tiny updates on subway map
maxulysse Aug 19, 2024
07c81e8
realign
maxulysse Aug 19, 2024
49fb5a8
tiny fixes
maxulysse Aug 19, 2024
fd5ba8c
update CHANGELOG
maxulysse Aug 19, 2024
bd83a81
Update CHANGELOG.md
maxulysse Aug 19, 2024
cbbe64a
Bump multiqc
pinin4fjords Aug 20, 2024
03daccd
Unset multiqc prefix for testing
pinin4fjords Aug 20, 2024
ad67c51
Merge pull request #1355 from maxulysse/subway_map_one_more_time
maxulysse Aug 20, 2024
dabbdaa
Merge branch 'dev' into harmonize_new_for_multiqc
pinin4fjords Aug 20, 2024
10a73f5
Update workflows/rnaseq/main.nf
pinin4fjords Aug 20, 2024
cb53ef3
Merge pull request #1352 from nf-core/harmonize_new_for_multiqc
pinin4fjords Aug 20, 2024
34fecc9
Merge branch 'dev' into stubs_everywhere
maxulysse Aug 20, 2024
3088bde
clean up stub tests
maxulysse Aug 20, 2024
93bb601
Fix anchor issue in multiqc
pinin4fjords Aug 20, 2024
9a00dbe
Update CHANGELOG.md
pinin4fjords Aug 20, 2024
75995bf
Merge pull request #1357 from nf-core/fix_anchor_issue
pinin4fjords Aug 20, 2024
a0499c8
Merge branch 'dev' into stubs_everywhere
maxulysse Aug 20, 2024
56539ac
update CHANGELOG
maxulysse Aug 20, 2024
7c189de
Animate subway map
maxulysse Aug 20, 2024
a68cb69
update snapshot
maxulysse Aug 20, 2024
1b34d1e
update snapshot
maxulysse Aug 21, 2024
ad43736
Merge pull request #1335 from maxulysse/stubs_everywhere
maxulysse Aug 21, 2024
4028290
Update test_full.config to restore a static URI for megatests
pinin4fjords Aug 21, 2024
d5da4bc
Update CHANGELOG.md
pinin4fjords Aug 21, 2024
bef25b6
use permalink
pinin4fjords Aug 21, 2024
065f5ad
Same for minimal tests
pinin4fjords Aug 21, 2024
04b147b
Update CHANGELOG.md
pinin4fjords Aug 21, 2024
8d11ea1
fix double reference
pinin4fjords Aug 21, 2024
9c52b08
Merge branch 'static_uri_megatests' of github.com:nf-core/rnaseq into…
pinin4fjords Aug 21, 2024
eff73a9
Fix input
pinin4fjords Aug 21, 2024
4a01a63
Merge pull request #1358 from nf-core/static_uri_megatests
pinin4fjords Aug 21, 2024
4464227
Revert multiqc workaround due to fix
pinin4fjords Aug 21, 2024
9130eea
Update changelog
pinin4fjords Aug 21, 2024
27d8091
Update changelog
pinin4fjords Aug 21, 2024
852dae1
snapshot all files contents by default
maxulysse Aug 21, 2024
1087b26
animate metro map
maxulysse Aug 21, 2024
7339ac8
Merge branch 'dev' into moar_snapshots
maxulysse Aug 21, 2024
b622978
Merge branch 'dev' into animate_subway
maxulysse Aug 21, 2024
8eb1311
fix MultiQC paths
maxulysse Aug 22, 2024
52de9f6
Install properly
pinin4fjords Aug 22, 2024
f6af6d9
Merge pull request #1359 from nf-core/revert_multiqc_workaround
pinin4fjords Aug 22, 2024
5708f7a
Merge branch 'dev' into animate_subway
maxulysse Aug 22, 2024
9e8f6cb
Tidy up multiqc test config
pinin4fjords Aug 22, 2024
8553a92
Add module config
pinin4fjords Aug 22, 2024
1632f4e
Update changelog
pinin4fjords Aug 22, 2024
ee0ad86
Install changes from modules
pinin4fjords Aug 22, 2024
6588b06
Merge pull request #1362 from nf-core/multiqc_test_config
maxulysse Aug 22, 2024
2265fac
update test and snapshot
maxulysse Aug 22, 2024
bf92d59
Merge branch 'dev' into moar_snapshots
maxulysse Aug 22, 2024
434a440
update MultiQC paths
maxulysse Aug 22, 2024
b7c780f
sort files
maxulysse Aug 22, 2024
aa1e6a9
update test and snapshot
maxulysse Aug 22, 2024
60ca773
update test and snapshot
maxulysse Aug 22, 2024
f8b1d3f
more snapshots
maxulysse Aug 22, 2024
c1d2f6a
update test and snapshot
maxulysse Aug 22, 2024
d03451e
update CHANGELOG
maxulysse Aug 22, 2024
c4a3dde
update test and snapshot
maxulysse Aug 22, 2024
e8cd2ac
update test and snapshot
maxulysse Aug 22, 2024
861c084
revert update CHANGELOG
maxulysse Aug 22, 2024
46e75d7
update test and snapshot
maxulysse Aug 22, 2024
914565f
Merge branch 'dev' into animate_subway
maxulysse Aug 22, 2024
09922ea
update CHANGELOG
maxulysse Aug 22, 2024
eedeb4f
text to path
maxulysse Aug 22, 2024
162c8b2
typo
maxulysse Aug 22, 2024
9a04059
more snapshots
maxulysse Aug 22, 2024
ab00caa
text to path
maxulysse Aug 22, 2024
f1867ad
update snapshots
maxulysse Aug 23, 2024
8802e2b
fastqc html are not stable
maxulysse Aug 23, 2024
85cf31e
update tests and snapshots
maxulysse Aug 23, 2024
12dfaed
update tests and snapshots
maxulysse Aug 23, 2024
01fadc1
update tests and snapshots
maxulysse Aug 23, 2024
f0f13fe
update tests and snapshots
maxulysse Aug 23, 2024
93cb05e
kallisto
maxulysse Aug 23, 2024
3c8bc75
Add link to static version
maxulysse Aug 23, 2024
c64c033
kallisto with snapshots
maxulysse Aug 23, 2024
0c296e5
kallisto updated snapshots
maxulysse Aug 23, 2024
792c809
min_mapped_reads snapshots
maxulysse Aug 23, 2024
8c7aa24
min_mapped_reads better snapshots
maxulysse Aug 23, 2024
33b34e7
fix stub snap
maxulysse Aug 23, 2024
8dd8c01
remove_ribo_rna snapshots
maxulysse Aug 23, 2024
8f9f874
update remove_ribo_rna snapshots
maxulysse Aug 26, 2024
715adf1
Merge pull request #1361 from maxulysse/animate_subway
maxulysse Aug 26, 2024
35c78e7
Merge branch 'dev' into moar_snapshots
maxulysse Aug 26, 2024
51b1b0f
salmon snapshots + update moar snapshots
maxulysse Aug 26, 2024
1426616
skip_qc snapshots
maxulysse Aug 26, 2024
3cf398a
update tests and snapshots
maxulysse Aug 26, 2024
439ae75
fix tests and snapshots
maxulysse Aug 26, 2024
c5f26a9
skip_trimming test and snapshots
maxulysse Aug 26, 2024
96b4369
update test and snapshots
maxulysse Aug 26, 2024
515f182
fix linting + star_rsem tests and snapshots
maxulysse Aug 26, 2024
14c5118
update CHANGELOG
maxulysse Aug 26, 2024
1ad7865
fix merge conflicts
maxulysse Aug 26, 2024
056ce81
update tests
maxulysse Aug 26, 2024
e4cfa09
update tests and snapshots
maxulysse Aug 26, 2024
8679eda
Merge pull request #1360 from maxulysse/moar_snapshots
maxulysse Aug 27, 2024
faabb90
update fastqc module
maxulysse Aug 28, 2024
96e3034
update utils subworkflows
maxulysse Aug 28, 2024
d513df5
use proper script for dupradar
maxulysse Aug 28, 2024
4f35611
no need for tags
maxulysse Aug 28, 2024
6688d46
no need for tags
maxulysse Aug 28, 2024
73504a6
update CHANGELOG
maxulysse Aug 28, 2024
b59f27f
Merge pull request #1363 from maxulysse/last_fix_before_3-15
maxulysse Aug 28, 2024
0c4dc96
Clarify docs on different tximport count files
pmoris Aug 29, 2024
d00cd97
Clarify design formula and blind dispersion estimation
pmoris Aug 29, 2024
d78a5d6
Update docs/output.md
pinin4fjords Sep 3, 2024
4f0aa71
Update docs/output.md
pinin4fjords Sep 3, 2024
803e02d
Update output.md
pmoris Sep 3, 2024
144c11f
Update CHANGELOG.md
pinin4fjords Sep 3, 2024
6450fbf
Remove restatement of params defaults
pinin4fjords Sep 3, 2024
760e83f
Update CHANGELOG.md
pinin4fjords Sep 3, 2024
dff00b6
Apply suggestions from code review
pinin4fjords Sep 3, 2024
1190e6c
Merge pull request #1367 from pmoris/clarify-deseq2-qc
pinin4fjords Sep 3, 2024
3587b18
Merge branch 'dev' into improve-docs-count-files
pinin4fjords Sep 3, 2024
0d93da5
Merge pull request #1366 from pmoris/improve-docs-count-files
pinin4fjords Sep 3, 2024
a266448
Bump versions
pinin4fjords Sep 3, 2024
65162d4
Bump modules
pinin4fjords Sep 3, 2024
b4d3735
Remove tags from tests
pinin4fjords Sep 3, 2024
9b1c08a
Fix version in snapshot
pinin4fjords Sep 3, 2024
fb38827
Merge branch 'prerelease_3.15.0_fixes' of github.com:nf-core/rnaseq i…
pinin4fjords Sep 3, 2024
349ad86
Merge pull request #1370 from nf-core/prerelease_3.15.0_fixes
pinin4fjords Sep 4, 2024
9fc9a35
Apply Maxime's suggestions to changelog
pinin4fjords Sep 4, 2024
90a1e83
Add new PRs to changelog
pinin4fjords Sep 4, 2024
2b32127
Bump CI
pinin4fjords Sep 4, 2024
e73c741
Merge pull request #1371 from nf-core/maxime_changelog_changes
pinin4fjords Sep 4, 2024
11f0c73
Bump tximeta/tximport for gene table rownames fix
pinin4fjords Sep 4, 2024
59988f8
Update changelog
pinin4fjords Sep 4, 2024
8446cbc
Merge pull request #1372 from nf-core/bump_tximport_gene_names
pinin4fjords Sep 4, 2024
4e34945
Merge pull request #1258 from nf-core/dev
pinin4fjords Sep 5, 2024
dfdd765
Bump versions for next release
pinin4fjords Sep 5, 2024
ad85ab2
Bump CI
pinin4fjords Sep 5, 2024
169318d
Update CHANGELOG.md
pinin4fjords Sep 5, 2024
4896bd0
Update changelog
pinin4fjords Sep 5, 2024
085ecfe
Merge pull request #1374 from nf-core/postrelese_3.15.0
pinin4fjords Sep 5, 2024
69f2d0a
Update Changelog
egreenberg7 Sep 10, 2024
68b9a21
Another changelog fix
egreenberg7 Sep 10, 2024
b4332d9
Linting fix
egreenberg7 Sep 10, 2024
457bdd9
Fix R module name-mangling issues
pinin4fjords Sep 13, 2024
765e67d
Bump changelog
pinin4fjords Sep 13, 2024
ffb223e
Merge pull request #1380 from nf-core/fix_tximport_summarizedexperiment
pinin4fjords Sep 13, 2024
7534d14
update all modules
maxulysse Sep 13, 2024
d1ca557
update CHANGELOG
maxulysse Sep 13, 2024
48077f8
better patch of picard md
maxulysse Sep 13, 2024
f6425c5
include conda changes to local modules too
maxulysse Sep 13, 2024
262672c
slight fixes for rnaseq prepro
pinin4fjords Sep 13, 2024
087a3a7
Merge pull request #1381 from maxulysse/massive_conda_update
maxulysse Sep 13, 2024
3701544
Remove chunk duplicated from subworkflow
pinin4fjords Sep 13, 2024
5c6982c
Try from branch
pinin4fjords Sep 13, 2024
8244d50
Merge branch 'dev' into rnaseq_propro_trimfail_fix
pinin4fjords Sep 16, 2024
3e9af61
install subworkflow from master
pinin4fjords Sep 16, 2024
5703690
Update changelog
pinin4fjords Sep 16, 2024
8def328
Merge pull request #1382 from nf-core/rnaseq_propro_trimfail_fix
pinin4fjords Sep 16, 2024
ec187a8
Bump versions
pinin4fjords Sep 16, 2024
ed31c6f
update changelog
pinin4fjords Sep 16, 2024
53023cd
Merge pull request #1383 from nf-core/prerelease_3.15.1_fixes
pinin4fjords Sep 16, 2024
18ce347
Merge branch 'master' into master_merge
pinin4fjords Sep 16, 2024
b91e759
Merge pull request #1385 from nf-core/master_merge
pinin4fjords Sep 16, 2024
6def003
nf-core pipelines bump-version 3.16.0dev
maxulysse Sep 17, 2024
85b320b
update CHANGELOG
maxulysse Sep 17, 2024
6ea73a6
Update CHANGELOG.md
maxulysse Sep 17, 2024
0b4125d
Merge pull request #1386 from maxulysse/3.16.0dev
maxulysse Sep 17, 2024
bc193df
Merge conflcits
egreenberg7 Sep 19, 2024
fd9b449
Merge branch 'dev' of github.com:egreenberg7/rnaseq into dev
egreenberg7 Sep 19, 2024
41bcd9c
Update hisat2 patch
egreenberg7 Sep 19, 2024
3e0b3e9
(Hopefully) final linting fix
egreenberg7 Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements

### Enhancements & fixes

- [PR #1351](https://github.com/nf-core/rnaseq/pull/1351) - Adding Kraken2/Bracken on unaligned reads as an additional quality control step to detect sample contamination
egreenberg7 marked this conversation as resolved.
Show resolved Hide resolved
- [PR #1186](https://github.com/nf-core/rnaseq/pull/1186) - Properly update qualimap/rnaseq module (ie not patch)
- [PR #1197](https://github.com/nf-core/rnaseq/pull/1197) - Delete lib directory and replace with utils\_\* subworkflows
- [PR #1199](https://github.com/nf-core/rnaseq/pull/1199) - Replace modules.config with more modular config files per module/subworkflow/workflow
Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@

> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.

- [Bracken](https://doi.org/10.7717/peerj-cs.104)

> Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ. Computer Science, 3(e104), e104. https://doi.org/10.7717/peerj-cs.104

- [fastp](https://www.ncbi.nlm.nih.gov/pubmed/30423086/)

> Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. PubMed PMID: 30423086; PubMed Central PMCID: PMC6129281.
Expand All @@ -38,6 +42,10 @@

> Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2. PubMed PMID: 31375807.

- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)

> Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
6. [`Kraken2`](https://ccb.jhu.edu/software/kraken2/) -> [`Bracken`](https://ccb.jhu.edu/software/bracken/) on unaligned sequences; _optional_
15. Pseudoalignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/) or ['Kallisto'](https://pachterlab.github.io/kallisto/); _optional_)
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))

Expand Down
Binary file added docs/images/bracken-top-n-plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/nf-core-rnaseq_metro_map_grey.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
189 changes: 132 additions & 57 deletions docs/images/nf-core-rnaseq_metro_map_grey.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 21 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Preseq](#preseq) - Estimation of library complexity
- [featureCounts](#featurecounts) - Read counting relative to gene biotype
- [DESeq2](#deseq2) - PCA plot and sample pairwise distance heatmap and dendrogram
- [Kraken2/Bracken](#kraken2bracken) - Taxonomic classification of unaligned reads
- [MultiQC](#multiqc) - Present QC for raw reads, alignment, read counting and sample similiarity
- [Pseudoalignment and quantification](#pseudoalignment-and-quantification)
- [Salmon](#pseudoalignment) - Wicked fast gene and isoform quantification relative to the transcriptome
Expand Down Expand Up @@ -654,6 +655,25 @@ The plot on the left hand side shows the standard PC plot - notice the variable

<p align="center"><img src="images/mqc_deseq2_clustering.png" alt="MultiQC - DESeq2 sample similarity plot" width="600"></p>

### Kraken2/Bracken

<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/contaminants/kraken2/kraken_reports`
- `*.kraken2.report.txt`: Classification of unaligned reads in the Kraken report format. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details
- `*.classified*.fastq.gz` If `--save_kraken_alignments`, outputs fastq file for each sample with each classified read annotated with taxonomic identification from Kraken2.
- `*.unclassified*.fastq.gz` If `save_kraken_unassigned`, outputs fastq file with all reads that were not classified by Kraken2.
- `<ALIGNER>/contaminants/bracken/`
- `*.kraken2.report_bracken.txt`: Kraken-style reports of the Bracken abundance estimate results. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details.
- `*.tsv` Summary of estimated reads for each taxon member at the given classification level and what corrections were made from Kraken2.

</details>

[Kraken2](https://ccb.jhu.edu/software/kraken2/) is a taxonomic classification tool that uses k-mer matches paired with a lowest common ancestory (LCA) algorithm to classify species reads. [Bracken](https://ccb.jhu.edu/software/bracken/) is a statistical method to generate abundance estimates based off of the Kraken2 output. These algorithms are run on unaligned sequences to detect potential contamination of samples. MultiQC reports the top 5 taxon members detected at the level of classification used for Bracken, with toggles available for higher taxonomic levels. If Bracken is skipped, MultiQC will report the top 5 species detected by Kraken2.

![MultiQC - Bracken top species plot](images/bracken-top-n-plot.png)

### MultiQC

<details markdown="1">
Expand All @@ -673,7 +693,7 @@ Results generated by MultiQC collate pipeline QC from supported tools i.e. FastQ

### Pseudoalignment

The principal output files are the same between Salmon and Kallsto:
The principal output files are the same between Salmon and Kallisto:

<details markdown="1">
<summary>Output files</summary>
Expand Down
8 changes: 8 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,14 @@ Notes:

By default, the input GTF file will be filtered to ensure that sequence names correspond to those in the genome fasta file, and to remove rows with empty transcript identifiers. Filtering can be bypassed completely where you are confident it is not necessary, using the `--skip_gtf_filter` parameter. If you just want to skip the 'transcript_id' checking component of the GTF filtering script used in the pipeline this can be disabled specifically using the `--skip_gtf_transcript_filter` parameter.

## Contamination screening options

The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.

It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).

While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution.

## Running the pipeline

The typical command for running the pipeline is as follows:
Expand Down
13 changes: 12 additions & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
"git_sha": "9ba6b02bbcb322ff4265cc51fca0ee5d8420b929",
"installed_by": ["modules"]
},
"bracken/bracken": {
"branch": "master",
"git_sha": "c214fad97b328eb6d6233f779be9ba44814a9136",
"installed_by": ["modules"]
},
"cat/fastq": {
"branch": "master",
"git_sha": "4fc983ad0b30e6e32696fa7d980c76c7bfe1c03e",
Expand Down Expand Up @@ -68,7 +73,8 @@
"hisat2/align": {
"branch": "master",
"git_sha": "2c6b1144ed58b6184ad58fc4e6b6a90219b4bf4f",
"installed_by": ["fastq_align_hisat2"]
"installed_by": ["fastq_align_hisat2"],
"patch": "modules/nf-core/hisat2/align/hisat2-align.diff"
},
"hisat2/build": {
"branch": "master",
Expand All @@ -90,6 +96,11 @@
"git_sha": "de5811dd9ca15af1e131806001bcaae909e42021",
"installed_by": ["modules", "quantify_pseudo_alignment"]
},
"kraken2/kraken2": {
"branch": "master",
"git_sha": "a13d5d945742a60bbef6e5c177e81cda540f75dc",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "b80f5fd12ff7c43938f424dd76392a2704fa2396",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/bracken/bracken/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 55 additions & 0 deletions modules/nf-core/bracken/bracken/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

51 changes: 51 additions & 0 deletions modules/nf-core/bracken/bracken/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions modules/nf-core/bracken/bracken/nextflow.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions modules/nf-core/bracken/bracken/tests/genus_test.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading