Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong location in ExonGenomicCoordsMapper.transcript_to_genomic_coordinates #345

Closed
jsstevenson opened this issue Aug 2, 2024 · 3 comments · Fixed by #352
Closed

Wrong location in ExonGenomicCoordsMapper.transcript_to_genomic_coordinates #345

jsstevenson opened this issue Aug 2, 2024 · 3 comments · Fixed by #352
Assignees
Labels
bug Something isn't working

Comments

@jsstevenson
Copy link
Member

Describe the bug

The genomic location for an exon in the following lookup appears to be incorrect.

Steps to reproduce

run this

from cool_seq_tool.app import CoolSeqTool
import asyncio

async def do_thing():
    egc = CoolSeqTool().ex_g_coords_mapper
    result = await egc.transcript_to_genomic_coordinates(
        "NM_002529.4",  # latest NTRK1 transcript ac
        "NTRK1",
        exon_start=1
    )
    print(result.genomic_data.start)  # 156861145
    print(result.genomic_data.strand)  # 1 (ie positive)

asyncio.run(do_thing())

Expected behavior

I think this is the ending coordinate of exon 1, not the starting coordinate.

  1. see this screenshot from the ncbi genome data viewer
Screenshot 2024-08-02 at 4 11 00 PM
  1. take a look at the result of this UTA query:
SELECT * FROM tx_exon_aln_v WHERE tx_ac = 'NM_002529.4' AND alt_aln_method='splign' AND alt_ac = 'NC_000001.11' AND ord = 0;

Also, I don't think this would matter, but this transcript aligns to the positive strand of chr1 (am I saying that right?) so there shouldn't be any weird "end is start and start is end" issues here

Current behavior

As noted above, the returned starting location is 156861145

Possible reason(s)

No response

Suggested fix

No response

Branch, commit, and/or version

The 0.5.1 release (it's what we have pinned for fusions) but I don't think it would be different on 0.6.0

Screenshots

No response

Environment details

mac

Additional details

No response

Contribution

Yes, I can create a PR for this fix.

@jsstevenson jsstevenson added the bug Something isn't working label Aug 2, 2024
@korikuzma
Copy link
Member

@jsstevenson I am resolving this in #224 . This was when we didn't know when to use start or end for the exon. @jarbesfeld has helped clear things up in the past few weeks!

@korikuzma korikuzma self-assigned this Aug 7, 2024
@korikuzma
Copy link
Member

@jarbesfeld says this output from #352 is correct

{
  "gene": "NTRK1",
  "genomic_ac": "NC_000001.11",
  "tx_ac": "NM_002529.4",
  "seg_start": {
    "exon_ord": 0,
    "offset": 0,
    "genomic_location": {
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
      },
      "start": 156860864
    }
  }
}

@korikuzma korikuzma linked a pull request Aug 20, 2024 that will close this issue
korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <[email protected]>
Copy link

Closed by #352.

korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <[email protected]>
korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants