Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally extract all fingerprint snps #1989

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yfarjoun
Copy link
Contributor

Description

Currently ExtractFingerprints only writes out the representative SNP for each haplotype block. While this is useful for the intended use-case (of creating a fingerprint and then using it subsequently in crosscheck or somesuch) there are other use-cases where it can be useful to know the genotype/PL at all the fingerprinting SNPs in each block.

Currently, I've been developing a way to increase the power of VBID2 on low-coverage bams using the haplotype blocks.

Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on github actions

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun I have some minor comments. Good to merge when they're resolved.

@@ -82,6 +82,11 @@ public class ExtractFingerprint extends CommandLineProgram {
@Argument(doc = "When true code will check for readability on input files (this can be slow on cloud access)")
public boolean TEST_INPUT_READABILITY = true;


@Hidden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. default=true boolean arguments are very strange to me. Why not invert to "EXTRACT_ALL_VARIANTS" or something like that?

  2. Why is it hidden? Would Advanced make more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -138,18 +137,31 @@ public static VariantContextSet createVCSetFromFingerprint(final Fingerprint fin

// convert all the haplotypes to variant contexts and add them to the set.
fingerPrint.values().stream()
.map(hp -> getVariantContext(reference, sample, hp))
.flatMap(hp -> getVariantContext(reference, sample, hp, representativeOnly))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatMap!

final Snp snp = haplotypeProbabilities.getRepresentativeSnp();
final byte refAllele = StringUtil.toUpperCase(reference.getSubsequenceAt(
final HaplotypeProbabilities haplotypeProbabilities,
final boolean representative_only) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

representativeOnly is more idiomatic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed.

final byte refAllele = StringUtil.toUpperCase(reference.getSubsequenceAt(
final HaplotypeProbabilities haplotypeProbabilities,
final boolean representative_only) {
if (representative_only) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like flatmaps, it's a bit awkward how you have to repeat the same if/else logic inside the flatmap repeatedly though. You might pull out the HP -> stream<> logic into it's own method that you flatmap over, and then map the resulting stream to get the VariantContext or name as appropriate. It looks like you probably didn't do that because of the way that getVariantContext relies on the outer HP so it can't be decomposed as nicely without nesting logic within the flatmap. Do what you like with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...seems like over-complication...

@@ -375,7 +375,7 @@ void testCanHandleDeepData(final HaplotypeProbabilitiesFromSequence hp, final in
final File fasta = new File(TEST_DATA_DIR, "reference.fasta");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want a test to show outputting multiple values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to, but it would involve building a bam...and I need to find time for that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I find a project that actually uses this, I'll make time to write a test... OK?

@lbergelson
Copy link
Member

@yfarjoun poke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants