You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Raman Gupta (Jira)" <ji...@apache.org> on 2020/05/26 20:57:00 UTC

[jira] [Commented] (KAFKA-10007) Kafka consumer offset reset despite recent group activity

    [ https://issues.apache.org/jira/browse/KAFKA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117039#comment-17117039 ] 

Raman Gupta commented on KAFKA-10007:
-------------------------------------

Just happened to me again on a completely different 2.4.1 broker. The cluster was recently downscaled from 4 brokers to 1 and today when a client restarted, it had lost its offsets from 3 partitions out of 100.  This is a really serious issue and needs some attention.

> Kafka consumer offset reset despite recent group activity
> ---------------------------------------------------------
>
>                 Key: KAFKA-10007
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10007
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Raman Gupta
>            Priority: Major
>
> I was running a Kafka 2.3.0 broker with the default values for `offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 days, as I have several low-traffic topics in which exactly-once processing is desired.
> As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and KIP-211, offsets should no longer be expired based on the last commit timestamp, but instead on the last time the group transitioned into an Empty state.
> However, the behavior I saw from Kafka upon broker shutdown was that the offsets were expired for a group when as far as I can tell, they should not have been. See these logs from during the cluster recycle -- during this time the consumer, configured with the static group membership protocol, is always running:
> {code}
> <<Running Kafka 2.3.0, 4 brokers, all on 2.3, protocol version 2.3, offsets.retention.minutes using default value>>
> [2020-05-10 05:37:01,070] <<Shutting down kafka-0>>
> << Starting broker-0 on 2.4.1 with protocol version 2.3, offsets.retention.minutes = 10080 >>
> kafka-0	[2020-05-10 05:37:39,682] INFO starting (kafka.server.KafkaServer)
> kafka-0	[2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading group metadata for produs-cis-CisFileEventConsumer with generation 27 (kafka.coordinator.group.GroupCoordinator)
> << Recycling broker-1 on 2.4.1, protocol version 2.3, offsets.retention.minutes = 10080, looks like the consumer fails because of the broker going down, and kafka-0 reports: >>
> kafka-0	[2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group produs-cis-CisFileEventConsumer has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
> kafka-0	[2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing to rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance with old generation 27 (__consumer_offsets-17) (reason: removing member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> kafka-0	[2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group produs-cis-CisFileEventConsumer has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
> kafka-0	[2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group produs-cis-CisFileEventConsumer with generation 28 is now empty (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> << and now kafka-1 starts up again, the offsets are expired >>
> kafka-1	[2020-05-10 05:46:11,229] INFO starting (kafka.server.KafkaServer)
> ...
> kafka-0	[2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing to rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance with old generation 28 (__consumer_offsets-17) (reason: Adding new member cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be with group instanceid Some(cis-9c5d994c5-sknlk)) (kafka.coordinator.group.GroupCoordinator)
> kafka-0	[2020-05-10 05:47:47,611] INFO [GroupMetadataManager brokerId=0] Removed 43 expired offsets in 13 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
> kafka-0	[2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized group produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> kafka-0	[2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment received from leader for group produs-cis-CisFileEventConsumer for generation 29 (kafka.coordinator.group.GroupCoordinator)
> {code}
> The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are expired about two minutes later at 05:47:47,611. I can't see any reason based on my understanding of how things work for this to have happened, other than it being a bug of some type?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)