Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Consumer Hangs on poll(), Possible Deadlock in Library? #2459

Closed
kietheros opened this issue Feb 6, 2025 · 3 comments
Closed

Kafka Consumer Hangs on poll(), Possible Deadlock in Library? #2459

kietheros opened this issue Feb 6, 2025 · 3 comments

Comments

@kietheros
Copy link

kietheros commented Feb 6, 2025

I am using a Kafka consumer with manual commit. My application has been running for almost a year, but recently, it started hanging at the consumer.poll() call.

When I dump the stack trace of the threads, I see the following two threads. It seems like there might be a bug in the library causing a deadlock. In the Kafka broker, everything appears to be working normally, and there are no errors.
I have also read [this issue](#1764), but that issue is related to the Kafka broker, whereas my Kafka setup is functioning correctly.

Has anyone encountered a similar issue or found a solution for this?

<Thread(Thread-36 (_run_auto_restart), started 124923968227008)>
....................................................................
    results = self.consumer.poll(
  File "/usr/local/lib/python3.10/site-packages/kafka/consumer/group.py", line 655, in poll
    records = self._poll_once(remaining, max_records, update_offsets=update_offsets)
  File "/usr/local/lib/python3.10/site-packages/kafka/consumer/group.py", line 675, in _poll_once
    self._coordinator.poll()
  File "/usr/local/lib/python3.10/site-packages/kafka/coordinator/consumer.py", line 270, in poll
    self.ensure_coordinator_ready()
  File "/usr/local/lib/python3.10/site-packages/kafka/coordinator/base.py", line 245, in ensure_coordinator_ready
    with self._client._lock, self._lock:
  File "/usr/local/lib/python3.10/threading.py", line 265, in __enter__
    return self._lock.__enter__()  <<<========= LOCK


<HeartbeatThread(post-processor-heartbeat, started daemon 124923819329216)>
  File "/usr/local/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/site-packages/kafka/coordinator/base.py", line 935, in run
    self._run_once()
  File "/usr/local/lib/python3.10/site-packages/kafka/coordinator/base.py", line 993, in _run_once
    self.coordinator.maybe_leave_group()
  File "/usr/local/lib/python3.10/site-packages/kafka/coordinator/base.py", line 766, in maybe_leave_group
    with self._client._lock, self._lock:   <<<======= LOCK
@dpkp
Copy link
Owner

dpkp commented Feb 6, 2025

Thanks for traces. I think #2460 should fix.

@kietheros
Copy link
Author

@dpkp Thank you! I am waiting for your new release.

@kietheros
Copy link
Author

@dpkp I have applied the fix, and the issue no longer occurs.

@dpkp dpkp closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants