Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ExonGenomicCoordsMapper #224

Open
korikuzma opened this issue Dec 8, 2023 · 12 comments
Open

Clean up ExonGenomicCoordsMapper #224

korikuzma opened this issue Dec 8, 2023 · 12 comments
Assignees
Labels
epic A collection of issues priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"

Comments

@korikuzma
Copy link
Member

A lot of this was written years ago. It's hard to follow what's happening. We should refactor this class

@korikuzma korikuzma added priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup" labels Dec 8, 2023
@korikuzma korikuzma self-assigned this Dec 27, 2023
@korikuzma
Copy link
Member Author

Apparently it’s hard to follow because the previous model was not the best. There is now a better way to represent using VRS 2.0. @ahwagner your time to shine ✨ I'd also like to throw out that example input/output would be nice 😉

@korikuzma
Copy link
Member Author

korikuzma commented Dec 28, 2023

I think the alignment part makes sense, but have struggled to find a good way to represent what is needed. So going to stop progress until we hear how to represent using VRS 2.0. Initial progress is in this branch.

@ahwagner
Copy link
Member

Linking the Cool Seq Tool documentation of this method for my own reference.

The idea of this method is to help us go from the ends of fusion transcript segments in exon representation, to where those ends exist on a genomic sequence. On occasion, fusion transcripts contain sequence that is intronic (exists on a genomic sequence but not the transcript), or have a junction that omits some sequence at the end of an exon. In the fusions model, we use offsets to describe this change. This concept of offset representation is also used in HGVS coding sequence representations.

I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model. Linking this thread for progress on that release. Once that is completed, we should use the beta model to revise how this is represented in FUSOR.

@korikuzma
Copy link
Member Author

In slack, @ahwagner asked: I think this is okay to resume now that we have the Adjacency class, right?

@ahwagner You had said in comment about VRS 2.0 beta1, but I don't think we're at beta yet. Did you want us to proceed still?

I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model.

@korikuzma
Copy link
Member Author

Tagging @jsstevenson and @katiestahl so they can follow. Slack doesn't allow thread in a thread

@ahwagner
Copy link
Member

We did away with Alpha/Beta/RC for maturity levels which may have created some confusion here. I am comfortable with the draft model for Adjacency from the latest VRS pre-release and we already have it in VRS-Python, so I think we can move forward on implementing it for Cool-Seq-Tool.

@korikuzma
Copy link
Member Author

@ahwagner ah okay. That's my bad for not remembering the maturity model changes.

@ahwagner
Copy link
Member

No fault here. With those changes this issue needed to be clarified. I also don't think anything has been lost by clarifying this now instead of earlier. However, with the recent cancervariants/fusion-curation#277 issue it is a good time to revisit.

@korikuzma
Copy link
Member Author

korikuzma commented Jul 31, 2024

Going to add Alex's requested changes here:

  • Rename methods to genomic_to_tx_segment and tx_segment_to_genomic
  • Strand should not be provided for genomic_to_tx_segment
  • In genomic_to_tx_segment, rename start and end to genomic_start and genomic_end
  • Assume that coordinates are inter-residue

New structure (Aligned Segment) will look like follows:

{
      "gene": "WEE1",
      "alt_ac": "NC_000011.10",
      "seg_start": {
            "exon_ord": 1,
            "offset": 0,
            "genomic_location": {
                  "type": "SequenceLocation",
                  "sequenceReference": {
                        "type": "SequenceReference",
                        "refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
                  },
                  "start": 9575887
            }
      },
      "seg_end": {
            "exon_ord": 10,
            "offset": 0,
            "genomic_location": {
                  "type": "SequenceLocation",
                  "sequenceReference": {
                        "type": "SequenceReference",
                        "refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
                  },
                  "end": 9589767
            }
      },
      "tx_ac": "NM_003390.3"
}

@jarbesfeld am I missing anything?

korikuzma added a commit that referenced this issue Jul 31, 2024
Addresses part of #224
* `transcript_to_genomic_coordinates` renamed to `tx_segment_to_genomic`
* `genomic_to_transcript_exon_coordinates` renamed to `genomic_to_tx_segment`
korikuzma added a commit that referenced this issue Jul 31, 2024
Addresses #224

* Move `get_tx_exons_genomic_coords` from `UtaDatabase` to `ExonGenomicCoordsMapper` as a private method (`_get_tx_exons_genomic_coords`)
korikuzma added a commit that referenced this issue Aug 1, 2024
korikuzma added a commit that referenced this issue Aug 1, 2024
Addresses #224

* `genomic_to_tx_segment` will now require inter-residue coordinates to be passed
korikuzma added a commit that referenced this issue Aug 1, 2024
close #335

Note: There are some places of +/- 1 for positions that I will revisit in #224 .
korikuzma added a commit that referenced this issue Aug 1, 2024
Addresses part of #224
* `transcript_to_genomic_coordinates` renamed to `tx_segment_to_genomic`
* `genomic_to_transcript_exon_coordinates` renamed to `genomic_to_tx_segment`
korikuzma added a commit that referenced this issue Aug 6, 2024
korikuzma added a commit that referenced this issue Aug 7, 2024
…#340)

Addresses #224 

* Strand is not required since it can be pulled from UTA
* This also prevents users from providing the wrong strand and getting an invalid response
@korikuzma
Copy link
Member Author

Fixing some labels
Cool-Seq-Tool Mapper(1)

korikuzma added a commit that referenced this issue Aug 7, 2024
korikuzma added a commit that referenced this issue Aug 7, 2024
…#344)

Addresses #224

* `genomic_to_tx_segment` will now require inter-residue coordinates to be passed
korikuzma added a commit that referenced this issue Aug 21, 2024
close #335

Note: There are some places of +/- 1 for positions that I will revisit in #224 .
korikuzma added a commit that referenced this issue Aug 21, 2024
Addresses part of #224
* `transcript_to_genomic_coordinates` renamed to `tx_segment_to_genomic`
* `genomic_to_transcript_exon_coordinates` renamed to `genomic_to_tx_segment`
korikuzma added a commit that referenced this issue Aug 21, 2024
addresses #224

* initial work for cleaning up exon coord data retrieval
korikuzma added a commit that referenced this issue Aug 21, 2024
addresses #224

* Use `ExonGenomicCoordsMapper._get_all_exon_coords` instead
korikuzma added a commit that referenced this issue Aug 21, 2024
…#340)

Addresses #224 

* Strand is not required since it can be pulled from UTA
* This also prevents users from providing the wrong strand and getting an invalid response
korikuzma added a commit that referenced this issue Aug 21, 2024
korikuzma added a commit that referenced this issue Aug 21, 2024
…#344)

Addresses #224

* `genomic_to_tx_segment` will now require inter-residue coordinates to be passed
korikuzma added a commit that referenced this issue Aug 21, 2024
close #335

Note: There are some places of +/- 1 for positions that I will revisit in #224 .
korikuzma added a commit that referenced this issue Aug 21, 2024
Addresses part of #224
* `transcript_to_genomic_coordinates` renamed to `tx_segment_to_genomic`
* `genomic_to_transcript_exon_coordinates` renamed to `genomic_to_tx_segment`
korikuzma added a commit that referenced this issue Aug 21, 2024
addresses #224

* initial work for cleaning up exon coord data retrieval
korikuzma added a commit that referenced this issue Aug 21, 2024
addresses #224

* Use `ExonGenomicCoordsMapper._get_all_exon_coords` instead
korikuzma added a commit that referenced this issue Aug 21, 2024
…#340)

Addresses #224 

* Strand is not required since it can be pulled from UTA
* This also prevents users from providing the wrong strand and getting an invalid response
korikuzma added a commit that referenced this issue Aug 21, 2024
korikuzma added a commit that referenced this issue Aug 21, 2024
…#344)

Addresses #224

* `genomic_to_tx_segment` will now require inter-residue coordinates to be passed
korikuzma added a commit that referenced this issue Aug 21, 2024
addresses #224

* No code was changed in the classes or methods
korikuzma added a commit that referenced this issue Aug 23, 2024
…#358)

Addresses #224

* No code was changed in the classes or methods.
@korikuzma
Copy link
Member Author

I think all that's left to do in this epic is DRY + smaller methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic A collection of issues priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
Projects
None yet
Development

No branches or pull requests

2 participants