-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGE gCNV #206
Conversation
I will test for identical results, and probably add a test as well |
Results confirmed to be identical to gatk wdl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, assuming the WDL has been validated elsewhere. Just had a question below out of curiosity.
"CNVCallingAndMergeForFabric.filtered_cnv_genotyped_segments_vcf_md5sum": null, | ||
"CNVCallingAndMergeForFabric.merged_vcf": { | ||
"file": "gs://palantir-workflows-test-data/CNVCallingAndMergeForFabric/0437227296_subset.merged.vcf.gz", | ||
"line_skip_regex": "^##bcftools|^##GATKCommandLine=|^##FORMAT=<ID=GT|^##source=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why skip the FORMAT=<ID=GT
line in the header? Is this not standardized so it fails because there's a different description or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, essentially. It turns out that MergeVcfs in Picard generates a non-deterministic header with respect to which input vcf it will pull the FORMAT=<ID=GT
line from. In our case, the header line includes a different description depending on whether it gets pulled from the gcnv vcf or the dragen vcf.
Pull BGE production pipeline from gatk branch to here.