Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline unable to recognise samples processed across multiple lanes #351

Closed
jma1991 opened this issue Oct 22, 2023 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@jma1991
Copy link

jma1991 commented Oct 22, 2023

Description of the bug

I've identified a potential issue in the recent pipeline release (v2.5.0). It seems the groupTuple command is executed twice during the input channel creation and branching of FASTQ files. As a result, the pipeline is unable to recognise samples processed across multiple lanes, due to an additional layer of file nesting. See here:

.groupTuple()
.map {
meta, fastq ->
def meta_clone = meta.clone()
meta_clone.id = meta_clone.id.split('_')[0..-2].join('_')
[ meta_clone, fastq ]
}
.groupTuple(by: [0])

Command used and terminal output

No response

Relevant files

No response

System information

No response

@jma1991 jma1991 added the bug Something isn't working label Oct 22, 2023
@mz448
Copy link

mz448 commented Nov 14, 2023

This believe this is an issue with the samplesheet.csv info

Discussion reference

Follow @FelixKrueger and @bioinfoMMS discussion in the slack channel -> conversation

How to "solve" it:

I was able to run the pipeline using bismark by adding an underscore "_" inside the name of the sample (in column 1) in the samplesheet.csv
e.g. ( use sample1_rep1 instead of sample1)
Make sure you use 4 header columns instead of 3 being the last genome. (this isn't very clear because the current documentation at https://nf-co.re/methylseq does not mention it! But Felix says it in the conversation

e.g.:

# use this:
sample, fastq_1, fastq_2, genome
sample1_rep1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz,
sample2_rep1, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz,

# instead of:
sample,fastq_1,fastq_2
sample1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz
sample2, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz

To add multiple lanes of the same sample, repeat the name of the sample, and they will merge during the processing.

@wkang0
Copy link

wkang0 commented Jan 4, 2024

This is bug in the instruction instead of in the code. To make one sample in different lanes, the sample sheet should look like this:

sample1_REP1,fq1.gz,fq2.gz
sample1_REP2,fq11,ga,fq12.gz

@sateeshperi
Copy link
Contributor

fixed in 2.7.1. please let us know if any issues. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants