You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "mjuarez (JIRA)" <ji...@apache.org> on 2016/06/08 18:22:21 UTC

[jira] [Comment Edited] (KAFKA-2904) Consumer Fails to Reconnect after 30s post restarts

    [ https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321094#comment-15321094 ] 

mjuarez edited comment on KAFKA-2904 at 6/8/16 6:22 PM:
--------------------------------------------------------

I'm seeing this issue repeatedly when leaving a Kafka consumer running for anything longer than a few hours, on a small volume topic (~2400 messages/second).  This is on Kafka 0.9.0.1 brokers, using the Java 0.9.0.1 client jars.

{quote}
2016-06-08 09:52:02,633 org.apache.kafka.clients.consumer.internals.ConsumerCoordinator [pool-2-thread-1] ERROR  Error ILLEGAL_GENERATION occurred while committing offsets for group TEST1_haymaker_to_hdfs
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
{quote}

After that, the app logs start getting flooded with this unhelpful error message:

{quote}
2016-06-08 09:52:11,321 org.apache.kafka.clients.consumer.internals.ConsumerCoordinator [pool-2-thread-1] ERROR  Offset commit failed.
org.apache.kafka.clients.consumer.internals.SendFailedException
{quote}

I have confirmed that the application is still consuming and committing offsets successfully, but it seems the ConsumerCoordinator is stuck trying to commit an offset, and failing repeatedly.


was (Author: mjuarez):
I'm seeing this issue repeatedly when leaving a Kafka consumer running for anything longer than a few hours, on a small volume topic (~2400 messages/second).  This is on Kafka 0.9.0.1 brokers, using the Java 0.9.0.1 client jars.

{quote}
2016-06-08 09:52:02,633 org.apache.kafka.clients.consumer.internals.ConsumerCoordinator [pool-2-thread-1] ERROR  Error ILLEGAL_GENERATION occurred while committing offsets for group TEST1_haymaker_to_hdfs
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
{quote}


> Consumer Fails to Reconnect after 30s post restarts
> ---------------------------------------------------
>
>                 Key: KAFKA-2904
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2904
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ben Stopford
>            Assignee: Ben Stopford
>         Attachments: 2015-11-27--001 (1).tar.gz
>
>
> This problem occurs in around 1 in 20 executions of the security rolling upgrade test. 
> Test scenario is a rolling upgrade where each of the three servers are restarted in turn whilst producer and consumers run. A ten second sleep between start and stop of each node has been added to ensure there is time for failover to occur (re KAFKA-2827). 
> Failure results in no consumed messages after the failure point. 
> Periodically the consumer does not reconnect for its 30s timeout. The consumer’s log at this point is at the bottom of this jira.
> ISR's appear normal at the time of the failure.
> The producer is able to produce throughout this period. 
> *TIMELINE:*
> {quote}
> 20:39:23 - Test starts Consumer and Producer
> 20:39:27 - Consumer starts consuming produced messages
> 20:39:30 - Node 1 shutdown complete
> 20:39:45 - Node 1 restarts
> 20:39:59 - Node 2 shutdown complete
> 20:40:14 - Node 2 restarts 
> 20:40:27 - Consumer stops consuming
> 20:40:28 - Node 2 becomes controller
> 20:40:28 - Node 3 shutdown complete
> 20:40:34 - GroupCoordinator 2: Preparing to restabilize group unique-test-group...
> 20:40:42 - Node 3 restarts
> *20:41:03 - Consumer times out*
> 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Assignment received from leader for group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Preparing to restabilize group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed 
> 20:41:53 - Producer shuts down
> {quote}
> Consumer log at time of failure:
> {quote}
> [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 (kafka.tools.ConsoleConsumer$)
> [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while committing offsets for group unique-test-group-0.952644842527 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be completed due to group rebalance (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while committing offsets for group unique-test-group-0.952644842527 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] WARN Auto offset commit failed:  (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,329] INFO Attempt to join group unique-test-group-0.952644842527 failed due to unknown member id, resetting and retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,347] INFO SyncGroup for group unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO SyncGroup for group unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, will find new coordinator and rejoin (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:28,097] INFO Attempt to join group unique-test-group-0.952644842527 failed due to unknown member id, resetting and retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Attempt to join group unique-test-group-0.952644842527 failed due to obsolete coordinator information, retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:41:03,704] ERROR Error processing message, terminating consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> 	at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
> 	at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
> 	at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
> 	at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
> 	at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.Login)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)