You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Manu Jacob <Ma...@sas.com> on 2019/03/21 01:39:17 UTC
Consumer poll stuck on
Hi,
We have a Kafka cluster (version 1.1.1) where one node unexpectedly failed. After that consumers from a couple of consumers are stuck in the poll() API call. Looking at the thread dump, it looks like the consumer is stuck in org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady() call. The heartbeat thread is also blocked waiting for the ConsumerCoordinator. Any idea what is the cause and how to resolve this issue? Thanks.
"BusinessEventRecordsDispatcherThread" #43 prio=5 os_prio=0 tid=0x00007f71764fb800 nid=0x241e sleeping[0x00007f70d24e9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.kafka.common.utils.SystemTime.sleep(SystemTime.java:45)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:235)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:351)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
...
at java.lang.Thread.run(Thread.java:748)
"kafka-coordinator-heartbeat-thread | prod-mkt-datahub-loader" #45 daemon prio=5 os_prio=0 tid=0x00007f711a286800 nid=0x2422 in Object.wait() [0x00007f7120125000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at java.lang.Object.wait(Object.java:502)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:937)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
RE: Consumer poll stuck on
Posted by Manu Jacob <Ma...@sas.com>.
Thanks for the reply. There are some replicas out of sync. Will try that and see. Thanks!
-----Original Message-----
From: 1095193290@qq.com <10...@qq.com>
Sent: Wednesday, March 20, 2019 10:56 PM
To: users <us...@kafka.apache.org>
Subject: Re: Consumer poll stuck on
EXTERNAL
Hi,
A you said consumer is stuck after one node failed, you should check whether the partitions of topic are in Isr by using kafka-topics command. The two topics you can pay attention to are __consumer_offsets and your business topic, check whether all partitions are in Isr(in-sync replicas). For example, ./kafka-topics.sh --zookeeper 56.32.15.98:24002/kafka --describe --topic __consumer_offsets Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:2 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 3 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 4 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 6 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 7 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 8 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 9 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 10 Leader: 1 Replicas: 1,2 Isr: 1,2 If any partitions is not in Isr, you can fix it.
1095193290@qq.com
From: Manu Jacob
Date: 2019-03-21 09:39
To: users@kafka.apache.org; dev@kafka.apache.org
Subject: Consumer poll stuck on
Hi,
We have a Kafka cluster (version 1.1.1) where one node unexpectedly failed. After that consumers from a couple of consumers are stuck in the poll() API call. Looking at the thread dump, it looks like the consumer is stuck in org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady() call. The heartbeat thread is also blocked waiting for the ConsumerCoordinator. Any idea what is the cause and how to resolve this issue? Thanks.
"BusinessEventRecordsDispatcherThread" #43 prio=5 os_prio=0 tid=0x00007f71764fb800 nid=0x241e sleeping[0x00007f70d24e9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.kafka.common.utils.SystemTime.sleep(SystemTime.java:45)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:235)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:351)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
...
at java.lang.Thread.run(Thread.java:748)
"kafka-coordinator-heartbeat-thread | prod-mkt-datahub-loader" #45 daemon prio=5 os_prio=0 tid=0x00007f711a286800 nid=0x2422 in Object.wait() [0x00007f7120125000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at java.lang.Object.wait(Object.java:502)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:937)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Re: Consumer poll stuck on
Posted by "1095193290@qq.com" <10...@qq.com>.
Hi,
A you said consumer is stuck after one node failed, you should check whether the partitions of topic are in Isr by using kafka-topics command. The two topics you can pay attention to are __consumer_offsets and your business topic, check whether all partitions are in Isr(in-sync replicas). For example,
./kafka-topics.sh --zookeeper 56.32.15.98:24002/kafka --describe --topic __consumer_offsets
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:2 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 3 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 4 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 6 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 7 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 8 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: __consumer_offsets Partition: 9 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: __consumer_offsets Partition: 10 Leader: 1 Replicas: 1,2 Isr: 1,2
If any partitions is not in Isr, you can fix it.
1095193290@qq.com
From: Manu Jacob
Date: 2019-03-21 09:39
To: users@kafka.apache.org; dev@kafka.apache.org
Subject: Consumer poll stuck on
Hi,
We have a Kafka cluster (version 1.1.1) where one node unexpectedly failed. After that consumers from a couple of consumers are stuck in the poll() API call. Looking at the thread dump, it looks like the consumer is stuck in org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady() call. The heartbeat thread is also blocked waiting for the ConsumerCoordinator. Any idea what is the cause and how to resolve this issue? Thanks.
"BusinessEventRecordsDispatcherThread" #43 prio=5 os_prio=0 tid=0x00007f71764fb800 nid=0x241e sleeping[0x00007f70d24e9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.kafka.common.utils.SystemTime.sleep(SystemTime.java:45)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:235)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:351)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
...
at java.lang.Thread.run(Thread.java:748)
"kafka-coordinator-heartbeat-thread | prod-mkt-datahub-loader" #45 daemon prio=5 os_prio=0 tid=0x00007f711a286800 nid=0x2422 in Object.wait() [0x00007f7120125000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at java.lang.Object.wait(Object.java:502)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:937)
- locked <0x00000004fc30e628> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)