Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-running failed operations with TzKT backend is so slow, it hangs #674

Open
nicolasochem opened this issue Jun 22, 2023 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@nicolasochem
Copy link
Contributor

This was described on slack and I've seen it as well:

When TRD fails, the next run attempts to run the failed payments again. But this results in a lot of calls to tzkt api. Recently, it seems that this endpoint has been rate-limited, causing failure like the example below (taken on ghostnet TRD).

A workaround is to delete the failed payment folder (assuming it's a solid failure and not a partial payment). But it would be good to look into why it's querying every

│ 2023-06-22 21:39:11,754 - producer  - INFO - Summary 4 paid, 0 done, 0 injected, 9 fail, 1 avoided                                                                                                               │
│ Exception in thread producer:                                                                                                                                                                                    │
│ Traceback (most recent call last):                                                                                                                                                                               │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn                                                                                                                  │
│     conn = connection.create_connection(                                                                                                                                                                         │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection                                                                                                      │
│     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):                                                                                                                                       │
│   File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo                                                                                                                                           │
│     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):                                                                                                                                      │
│ socket.gaierror: [Errno -3] Try again                                                                                                                                                                            │
│                                                                                                                                                                                                                  │
│ During handling of the above exception, another exception occurred:                                                                                                                                              │
│                                                                                                                                                                                                                  │
│ Traceback (most recent call last):                                                                                                                                                                               │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen                                                                                                                │
│     httplib_response = self._make_request(                                                                                                                                                                       │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request                                                                                                          │
│     self._validate_conn(conn)                                                                                                                                                                                    │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn                                                                                                        │
│     conn.connect()                                                                                                                                                                                               │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connection.py", line 363, in connect                                                                                                                    │
│     self.sock = conn = self._new_conn()                                                                                                                                                                          │
│   File "/app/.local/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn                                                                                                                  │
│     raise NewConnectionError(                                                                                                                                                                                    │
│ urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7ff7a2064460>: Failed to establish a new connection: [Errno -3] Try again                                                 │
│                                                                                                                                                                                                                  │
│ During handling of the above exception, another exception occurred:          

Link to baker slack discussion: https://tezos-baking.slack.com/archives/CQ35AM8KE/p1685978794781209

@nicolasochem nicolasochem added the bug Something isn't working label Jun 22, 2023
@nicolasochem
Copy link
Contributor Author

An update on this.

When running in retry mode, for every delegator with a failed payout, we update the balance:

https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/blob/master/src/pay/retry_producer.py#L106-L107

but the only other occurence of update_current_balances() function in the producer logic is in CalculatePhase4 which is for founders/owners.

We are not actually querying individual delegators balances with tzkt API during initial payment: we are bulk querying the indexer with a balance list of delegators self.reward_provider_model.delegator_balance_dict.

Only during retries, we do it, which takes a very long time and often fails.

A solution would be to modify the retry logic to query all balances again using the same API as initial payment. I'm not going to do this now, instead I'll just record my observations here.

@nicolasochem
Copy link
Contributor Author

This is what's in the verbose log when this happens:

2024-02-15 17:46:15,839 - producer  - DEBUG - Requesting https://api.tzkt.io/v1/accounts/tz2xxxxx
2024-02-15 17:46:15,950 - producer  - DEBUG - Response from TzKT is:
{'activeRefutationGamesCount': 0,
 'activeTicketsCount': 0,
 'activeTokensCount': 60,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants