-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samplesheet input is not detecting entire values in sample column #378
Comments
I suspect that this is the offending code: methylseq/workflows/methylseq.nf Lines 101 to 103 in 54f823e
I'm not 100% if this is a bug or a feature. If a feature then it should have better docs. |
I think that this issue is essentially the inverse of #351 (here it's happening by accident, there it was the desired behaviour). |
I'm so happy I found this issue, was pulling my hair the whole day thinking my code was wrong. Quickfix: changed the underscore to a dot in my |
I've encountered the same issue that the sample name inputs were uncompleted causing later errors. My sequencing was paired-end with 4 lane per sample, so may also have the problem mentioned in #381 . Could anyone provide an updated workable samplesheet.csv example? Really confused now. |
@CathyXD If you add a random number after the last underscore (i.e., a suffix for each sample name: _x, _x, _x, _x) to each sample name, they will not be concatenated. Similar thing works for nf-core/chipseq pipeline where everything before the last underscore is used to infer group names. The pipeline decides to pool the samples in this bit of code: .map {
meta, fastq ->
def meta_clone = meta.clone()
parts = meta_clone.id.split('_')
meta_clone.id = parts.length > 1 ? parts[0..-2].join('_') : meta_clone.id
[ meta_clone, fastq ]
} in
Here are some examples: input: output: example2: one underscore will pool the samples based on everything before the last underscore input: output: |
Thank you!! I've spent a week trying to figure out why the pipeline is strangely concatenating my input fastqs. Could you please add this info regarding naming conventions to README or fix the code? |
fixed in |
Description of the bug
when (a certain number of?) underscores are used in sample column, sometimes only a substring of the entire value is read in, rather than the whole value.
E.g. BATCH_DATE_SAMPLE is read in as BATCH_DATE in the following example. The impact is that when the read-in partial value is not unique , the pipeline will erroneously treat multiple (unique) rows as replicates.
See screenshots of an example samplesheet and running pipeline for example
Command used and terminal output
No response
Relevant files
No response
System information
version methylseq 2.6.0
No response
The text was updated successfully, but these errors were encountered: