-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally extract all fingerprint snps #1989
base: master
Are you sure you want to change the base?
Optionally extract all fingerprint snps #1989
Conversation
…h other resources.
…(not only representatives)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun I have some minor comments. Good to merge when they're resolved.
@@ -82,6 +82,11 @@ public class ExtractFingerprint extends CommandLineProgram { | |||
@Argument(doc = "When true code will check for readability on input files (this can be slow on cloud access)") | |||
public boolean TEST_INPUT_READABILITY = true; | |||
|
|||
|
|||
@Hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
default=true boolean arguments are very strange to me. Why not invert to "EXTRACT_ALL_VARIANTS" or something like that?
-
Why is it hidden? Would Advanced make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
@@ -138,18 +137,31 @@ public static VariantContextSet createVCSetFromFingerprint(final Fingerprint fin | |||
|
|||
// convert all the haplotypes to variant contexts and add them to the set. | |||
fingerPrint.values().stream() | |||
.map(hp -> getVariantContext(reference, sample, hp)) | |||
.flatMap(hp -> getVariantContext(reference, sample, hp, representativeOnly)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatMap!
final Snp snp = haplotypeProbabilities.getRepresentativeSnp(); | ||
final byte refAllele = StringUtil.toUpperCase(reference.getSubsequenceAt( | ||
final HaplotypeProbabilities haplotypeProbabilities, | ||
final boolean representative_only) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
representativeOnly is more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed.
final byte refAllele = StringUtil.toUpperCase(reference.getSubsequenceAt( | ||
final HaplotypeProbabilities haplotypeProbabilities, | ||
final boolean representative_only) { | ||
if (representative_only) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like flatmaps, it's a bit awkward how you have to repeat the same if/else logic inside the flatmap repeatedly though. You might pull out the HP -> stream<> logic into it's own method that you flatmap over, and then map the resulting stream to get the VariantContext or name as appropriate. It looks like you probably didn't do that because of the way that getVariantContext relies on the outer HP so it can't be decomposed as nicely without nesting logic within the flatmap. Do what you like with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah...seems like over-complication...
@@ -375,7 +375,7 @@ void testCanHandleDeepData(final HaplotypeProbabilitiesFromSequence hp, final in | |||
final File fasta = new File(TEST_DATA_DIR, "reference.fasta"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want a test to show outputting multiple values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to, but it would involve building a bam...and I need to find time for that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I find a project that actually uses this, I'll make time to write a test... OK?
@yfarjoun poke |
Description
Currently ExtractFingerprints only writes out the representative SNP for each haplotype block. While this is useful for the intended use-case (of creating a fingerprint and then using it subsequently in crosscheck or somesuch) there are other use-cases where it can be useful to know the genotype/PL at all the fingerprinting SNPs in each block.
Currently, I've been developing a way to increase the power of VBID2 on low-coverage bams using the haplotype blocks.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests