Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing native tokens in prices models #6577

Open
0xRobin opened this issue Aug 19, 2024 · 3 comments
Open

Representing native tokens in prices models #6577

0xRobin opened this issue Aug 19, 2024 · 3 comments
Assignees
Labels
in review Assignee is currently reviewing the PR

Comments

@0xRobin
Copy link
Collaborator

0xRobin commented Aug 19, 2024

Representing native tokens in prices models

Native tokens are often annoying to deal with when pulling in price info.
I'll describe the current state with it's problems and solutions, and what I think should be the desired state.

1. Current state

Currently the native tokens are defined here:

('eos-eos', null, 'EOS', null, null),
('etc-ethereum-classic', null, 'ETC', null, null),
('eth-ethereum', null, 'ETH', null, null),
('ftm-fantom', null, 'FTM', null, null),

They have blockchain, contract_address and decimals as null values.

This is also how they show up in prices.usd (dune query)
image

Problems and current solutions

When you have a model that includes trades from both erc20 as native tokens, you have to adapt your query to deal with this.
Most solutions follow any of these setups

  1. replace the native rows with a wrapped alternative
select * 
from (
    select
       blockchain
      ,tx_hash
      ,case when currency_contract = 0x0000000000000000000000000000000000000000 
         then 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 
         else currency_contract 
       end as currency_contract
      ,amount
      from trades
)
left join prices.usd
    on blockchain = blockchain
    and contract_address = currency_contract

here we end up using the wrong price feed (WETH instead of ETH), this logic will break whenever there's a depeg event and we also end up with the wrong symbol in our end table.

  1. Add extra logic in the join condition
select
   blockchain
  ,tx_hash
  ,currency_contract
  ,amount
  ,price
from trades
left join prices.usd
on (blockchain = blockchain 
    and currency_contract = contract_address)
or (currency_contract = 0x0000000000000000000000000000000000000000
    and blockchain is null
    and symbol = 'ETH')

This works fine, but requiring blockchain null is confusing, and we're relying solely on the symbol column here to determine the price feed, which is a very unsafe column that can hold arbitrary data.
This is the current default that I've seen the most.
a note from the prices beta announcement reiterates the danger of relying on the symbol column:

Prices are calculated at the contract_address and blockchain level. Token symbol is not a unique identifier as different tokens may have the same symbol.

  1. Patch the prices model so it integrates better

-- TODO: We should remove this CTE and include ETH into the general prices table once everything is migrated
WITH prices_patch as (
SELECT
contract_address
,blockchain
,decimals
,minute
,price
,symbol
FROM {{ source('prices','usd_forward_fill') }}
{% if is_incremental() %}
WHERE {{incremental_predicate('minute')}}
{% endif %}
UNION ALL
SELECT
{{ var("ETH_ERC20_ADDRESS") }} as contract_address
,'ethereum' as blockchain
,18 as decimals
,minute
,price
,'ETH' as symbol
FROM {{ source('prices','usd_forward_fill') }}
WHERE blockchain is null AND symbol = 'ETH'
{% if is_incremental() %}
AND {{incremental_predicate('minute')}}
{% endif %}

This is mostly something I've been doing, haven't seen it adopted elsewhere.

  1. Join in a prices table for the native token seperately
    , COALESCE(pu_eth.price*SUM(CAST(et.value as DOUBLE))/POWER(10, 18), pu_erc20s.price*SUM(CAST(erc20s.value as DOUBLE))/POWER(10, pu_erc20s.decimals))*(nft_mints.amount/nft_count.nfts_minted_in_tx) AS amount_usd

    LEFT JOIN {{ source('prices','usd') }} pu_eth
    ON pu_eth.blockchain IS NULL
    AND pu_eth.minute=date_trunc('minute', et.block_time)
    AND pu_eth.symbol = 'ETH'
    {% if is_incremental() %}
    AND {{incremental_predicate('pu_eth.minute')}}
    {% endif %}

    LEFT JOIN {{ source('prices','usd') }} pu_erc20s ON pu_erc20s.blockchain='{{blockchain}}'
    AND pu_erc20s.minute=date_trunc('minute', erc20s.evt_block_time)
    AND erc20s.contract_address=pu_erc20s.contract_address
    {% if is_incremental() %}
    AND {{incremental_predicate('pu_erc20s.minute')}}
    {% endif %}

Conclusion: joining prices with native tokens can be tricky and is very subjective to produce join duplicates when small errors are made. This has been the subject of many data error investigations in spellbook.

2. Desired state

Prices should be uniquely identified by:

  • blockchain
  • contract_address
  • timestamp

and native tokens should follow this rule.
When this is true any join with prices would simply look like this:

left join prices.usd
on blockchain = blockchain
  and contract_address = currency_address
  and minute = block_time

no extra join logic, no case-when, no coalesce() in your select statement..

My proposal is to add all native tokens in the prices tables with:

  • 0x0000000000000000000000000000000000000000 as contract_address
  • blockchain filled in correctly
  • decimals filled in correctly

If there are different standards for representing the native token as an erc20 address (or precompile addresses on some L2s), we can either specify the contract address for each chain individually, or we can add all representations that makes sense.
eg. We could have the ethereum price feed both at 0x00000.. and at 0xeeee... so that users don't have to clean up their data if the source uses a different representation.
OR we impose 1 native address per chain to force standardization.

This is a small change in spellbook, and could be quickly implemented.
The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

@jeff-dude jeff-dude self-assigned this Aug 20, 2024
@jeff-dude jeff-dude added the in review Assignee is currently reviewing the PR label Aug 20, 2024
@jeff-dude
Copy link
Member

jeff-dude commented Aug 20, 2024

thinking out loud here on prices pipeline and summarizing above:

  • trusted tokens sql file, continue current approach and scale per chain added, reads from coinpaprika to get price
  • native tokens sql file, modify existing to include contract_address, symbol, decimals per chain
    • note: some chains could have multiple rows (unique address), as noted above
  • new pipeline which generates prices for all other tokens (WIP pipeline)

three different inputs, all write to separate tables, then final union view per level of granularity (minute, hour, day).

table is then clean in terms of level of granularity and consistency of data written to all columns.

downstream:

  • all joins to prices is now the same for any scenario
  • while change to native tokens is simple in the spell itself, we need to measure the impact downstream (both spells joining prices in various ways + queries on the dune app)
    • how to handle if large impact on queries?

expectation:

  • blockchain team will gather info from chains as they onboard onto dune to ensure we apply native token data correctly

@0xBoxer
Copy link
Collaborator

0xBoxer commented Aug 21, 2024

The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

We could implement your desired state and carry forward the existing implementation forward.
This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

@jeff-dude
Copy link
Member

jeff-dude commented Aug 21, 2024

We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

not a bad idea if we want to avoid breaking changes. something like:

  • keep native tokens file as-is, but add another row to populate all fields such as blockchain, address, etc etc
  • technically we are duplicating the data, which would grow the table size further (it's about to explode in size anyway)
  • while we duplicate the data, the unique columns would remain unique, so joins wouldn't break (both old and new)

maybe there is something i'm not thinking of top of mind, but we could explore this

edit:
this may still break one of the options above, after speaking a bit with rob further on it. my suggestion is that we put a task in triage to prioritize which would run a test to see if the above would work or not to resolve all above scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Assignee is currently reviewing the PR
Projects
None yet
Development

No branches or pull requests

3 participants