-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HsMetrics PCT_USABLE_BASES_ON_BAIT definition/calculation error #1996
Comments
Thanks for reporting this. It looks like a bug to me. We'll look into it and try to verify that. @yfarjoun Any thoughts on this? Looks fishy to me. |
The code seems to be confused altogether: numerators and denominators not agreeing on using deduped or non-deduped counts.... I agree that there's a problem, but I think that it's greater than the documentation.... |
closes: broadinstitute#1996 The documentation in HsMetrics class was inaccurate regarding the filtering of reads that go into the PCT_USABLE_BASES_ON_BAIT. It now correctly reflects the fact that the reads/bases that go into this calculation are _not_ unique, i.e. duplicate reads are counted.
having spoken with @tfenne offline, I understand now that the confusion I mentioned earlier is by design: BAIT related information is supposed to inform regarding the performance of the selection thus duplicates are not counted. PCT_USABLE_BASES_ON_BAIT=20% means that 20% of the bases that you sequenced were found on the baits. this enables the lab to tweak the selection process without consideration to the PCR process or the insert sizes etc.. PCT_USABLE_BASES_ON_TARGET=20% means that 20% of the bases that you sequenced could be used for calling variants in the target region. This combines the effect of the PCR and the selection and insert size (among other things), and may serve as an overall efficiency metric. |
Hi!
On the webpage https://broadinstitute.github.io/picard/picard-metric-definitions.html#HsMetrics ; the HsMetrics output named "PCT_USABLE_BASES_ON_BAIT" is defined as "The number of aligned, de-duped, on-bait bases out of the PF bases available.". However, if you check line 91 in https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/analysis/directed/HsMetricCollector.java , as well as lines 531-550 in https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/analysis/directed/TargetMetricsCollector.java, you can see that this metric uses aligned on-bait bases, without considering duplicates. This results in discrepancies between PCT_USABLE_BASES_ON_BAIT and PCT_USABLE_BASES_ON_TARGET, because the latter is calculated using de-duped counts. Just wanted to raise the issue so that the definition can be corrected!
Best regards
The text was updated successfully, but these errors were encountered: