Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor zookeeper error handling #595

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

eliska-n
Copy link
Collaborator

@eliska-n eliska-n commented Aug 7, 2024

Hi, @mithunbharadwaj , @ateska

I would like to discuss error handling in zookeeper code.

Catching NoNodeError in the library provider is useless, becuase this one is already caught in the wrapper.

We experience uncaught kazoo errors in every microservice when zookeeper disconnects (see sentry for more detail). I think it originates from this point in the zookeeper library provider - we resigned to handle the errors. However, those should be either propagated further as a specific Library Error (so it can be caught in the microservice) or all the errors IN READ mode (i.e. in the library) should be silent and return None.

Right now, we mix both approaches - NoNode Errors are ignored (returning None) and other errors are not caught at all. That means that in my microservice, I always have to check for None and also do try/except for the rest of kazoo errors (which we typically don't do).

On the other hand, I suggest to propagate all kazoo errors from WRITE operations in the wrapper to the microservice, so I can catch them and decide specifically what I want to do with them. (write operations are not in the library provider.) Ignoring NoNodeError in write operations does not seem very convenient to me. Especially, when I need to handle other errors, anyway. https://github.com/TeskaLabs/asab/blob/master/asab/zookeeper/wrapper.py#L78

I'm afraid I missed some use cases that won't fit my suggestions. However, I'd like to agree on some "ideal" approach how to read from ZK, so we diminish the kazoo errors in sentry.

@eliska-n
Copy link
Collaborator Author

12-Aug-2024 09:28:45.389086 ERROR asab.pubsub [sd task="asab.PubSub.Application.tick/60!"] Error during pubsub delivery
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/asab/pubsub.py", line 278, in _deliver_async_exited
    task.result()
  File "/usr/lib/python3.11/site-packages/asab/library/service.py", line 108, in _on_tick60
    await self._read_disabled()
  File "/usr/lib/python3.11/site-packages/asab/library/service.py", line 352, in _read_disabled
    disabled = await self.Libraries[0].read('/.disabled.yaml')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asab/library/providers/zookeeper.py", line 238, in read
    node_data = await self.Zookeeper.get_data(node_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asab/zookeeper/wrapper.py", line 64, in get_data
    data, stat = await self.ProactorService.execute(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asab/pubsub.py", line 278, in _deliver_async_exited
    task.result()
  File "/usr/lib/python3.11/site-packages/asab/library/service.py", line 108, in _on_tick60
    await self._read_disabled()
  File "/usr/lib/python3.11/site-packages/asab/library/service.py", line 352, in _read_disabled
    disabled = await self.Libraries[0].read('/.disabled.yaml')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asab/library/providers/zookeeper.py", line 238, in read
    node_data = await self.Zookeeper.get_data(node_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/asab/zookeeper/wrapper.py", line 64, in get_data
    data, stat = await self.ProactorService.execute(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/site-packages/kazoo/retry.py", line 132, in __call__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/client.py", line 1237, in get
    return self.get_async(path, watch=watch).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/kazoo/handlers/utils.py", line 78, in get
    raise self._exception
kazoo.exceptions.SessionExpiredError

This one should be handled also

@ateska
Copy link
Contributor

ateska commented Aug 13, 2024

This is weird reaction of Kazoo to "SessionExpiration" situation.

We can indeed isolate the must "noisy" calls here and make them more resistant against these exceptions, in this case it is:

File "/usr/lib/python3.11/site-packages/asab/library/service.py", line 352, in _read_disabled
    disabled = await self.Libraries[0].read('/.disabled.yaml')

@mithunbharadwaj pls task for this

In general, we need to be careful not to hide too much exceptions from Zookeeper otherwise we will start observing a "magically strange" behaviour of the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants