You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "huxihx (JIRA)" <ji...@apache.org> on 2017/12/04 09:43:00 UTC

[jira] [Commented] (KAFKA-6306) Auto-commit of offsets fail, and not recover forever...

    [ https://issues.apache.org/jira/browse/KAFKA-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276532#comment-16276532 ] 

huxihx commented on KAFKA-6306:
-------------------------------

Seems this is sort of a configuration mistake. In my test environment, the group coordinator keeps triggering rebalancing when auto commit offsets got failed due to this error, and user code will rejoin the group in the next `poll` run. No need to add `resetGeneration` here. The really thing we should do in this case is to increase the max.poll.interval.ms or decrease max.poll.records as per the exception's description.

> Auto-commit of offsets fail, and not recover forever...
> -------------------------------------------------------
>
>                 Key: KAFKA-6306
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6306
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.10.2.1, 1.0.0
>            Reporter: HongLiang
>              Labels: patch
>         Attachments: _883ddf50-beb7-4e87-9630-168acaa9b046.png, auto-commit-fail-bugs.patch, e6cf53be-e128-47dc-a45a-79439a9e55ff.png, pool_46ba3275-7b56-4c64-a4f4-7280eb7f1728.png
>
>
> Auto-commit of offsets fail, and not recover forever. at sendOffsetCommitRequest, while "generation equal NULL", ConsumerCoordinator request will fail always. it maybe a bug. error log below:
> has more and more warn log ....
> "2017-12-01 22:08:39.112 WARN pool-390-thread-1#1 (ConsumerCoordinator.java:626) - Auto-commit of offsets {drawing_gift_sent-1=OffsetAndMetadata{offset=32150359, metadata=''}} failed for group gift_rich_audience_write: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records."
> !e6cf53be-e128-47dc-a45a-79439a9e55ff.png|thumbnail!
> !_883ddf50-beb7-4e87-9630-168acaa9b046.png|thumbnail!
> !pool_46ba3275-7b56-4c64-a4f4-7280eb7f1728.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)