You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Maneesh Bhunwal <ma...@gmail.com> on 2022/10/06 09:09:59 UTC

Consumer group state is in inconsistent state

Hi Team,

We have 6 node kafka cluster (version 2.0.0). when i try to get the state
of a consumer group by only specifying only one broker ip, I am getting
different results (4 of the brokers are responding with 1 response and 2 of
the brokers with another response.)

bin/kafka-consumer-groups.sh  --bootstrap-server 10.32.218.112:9092
--describe --state  --group consumer-group
COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE
  #MEMBERS10.32.218.112:9092 (1)    range                     Stable
            1


bin/kafka-consumer-groups.sh  --bootstrap-server 10.32.67.102:9092
--describe --state  --group consumer-group
COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE
  #MEMBERS10.32.218.112:9092 (1)    range                     Stable
            1


bin/kafka-consumer-groups.sh  --bootstrap-server 10.33.150.9:9092
--describe --state  --group consumer-group
Consumer group 'consumer-group' has no active members.
COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE
  #MEMBERS10.35.168.252:9092 (4)                              Empty
            0


bin/kafka-consumer-groups.sh  --bootstrap-server 10.35.168.252:9092
--describe --state  --group consumer-group
Consumer group 'consumer-group' has no active members.
COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE
  #MEMBERS10.35.168.252:9092 (4)                              Empty
            0

bin/kafka-consumer-groups.sh  --bootstrap-server 10.33.21.48:9092
--describe --state  --group consumer-group
Consumer group 'consumer-group' has no active members.
COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE
  #MEMBERS10.35.168.252:9092 (4)                              Empty
            0


I can also see the same behaviour with other consumer groups as well.
There are few consumer groups which are active in both mini clusters
(not sure what should be the appropriate name in this case).

The validations i have done

1. all the brokers are active and are able to talk to each other.

2. all the brokers have all other brokers listed when we run
bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 |
awk '/^[a-z]/ {print $1}'

3. checked controller ip from zookeeper and validated there are no
anomalies in controller logs of all the boxes.

4. I am able to reproduce the same issue right now by doing these
steps for a new consumer group

     a. start kafka consumer with group cg1 using brokerip 10.32.218.112:9092
     b. validate that status using brokerip 10.32.218.112:9092 is
showing consumer as live
     c. validate that status using brokerip 10.35.168.252:9092 is
showing consumer not live
     d. start kafka consumer with group cg1 using brokerip 10.35.168.252:9092
     e. validate that status using brokerip 10.32.218.112:9092 is
showing consumer as live
     f. validate that status using brokerip 10.35.168.252:9092 is
showing consumer as live

   but the consumer id both the brokers are reporting are different.
Also when we stop both the consumers last read commit offset reported
by both the brokers are different.
      Confirming that both the consumers are treated separately.

5. the only suspicious log that i found in one of the borker is

     WARN [2021-07-29 12:51:52,811] [kafka-request-handler-15][]
state.change.logger - [Broker id=5] Ignoring LeaderAndIsr request from
controller 1 with correlation id 2 epoch 5 for
     partition __consumer_offsets-15 since its associated leader epoch
101 is not higher than the current leader epoch 101

     There are quite a few of these logs for different partitions, and also
similar failure logs in controller logs of controller.


I have tried searching on stackoverflow and kafka jira but not able to find
relevant issue. hence reaching out to you. Can you please help with this?

Regards
Maneesh Bhunwal

Re: Consumer group state is in inconsistent state

Posted by Maneesh Bhunwal <ma...@gmail.com>.
If it helps,

I can see that the group coordinator of the consumer group is changing.
from one partition to another ( I validated by checking __consumer_offsets
topic's messages)
From what i have read, it is derived from consumer groupname's hash modulo
number of partitions in __consumer_offsets.
Was this the case in 2.0.0 as well?

Regards
Maneesh Bhunwal




On Thu, 6 Oct 2022 at 14:39, Maneesh Bhunwal <ma...@gmail.com>
wrote:

> Hi Team,
>
> We have 6 node kafka cluster (version 2.0.0). when i try to get the state
> of a consumer group by only specifying only one broker ip, I am getting
> different results (4 of the brokers are responding with 1 response and 2 of
> the brokers with another response.)
>
> bin/kafka-consumer-groups.sh  --bootstrap-server 10.32.218.112:9092 --describe --state  --group consumer-group
> COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE                #MEMBERS10.32.218.112:9092 (1)    range                     Stable               1
>
>
> bin/kafka-consumer-groups.sh  --bootstrap-server 10.32.67.102:9092 --describe --state  --group consumer-group
> COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE                #MEMBERS10.32.218.112:9092 (1)    range                     Stable               1
>
>
> bin/kafka-consumer-groups.sh  --bootstrap-server 10.33.150.9:9092 --describe --state  --group consumer-group
> Consumer group 'consumer-group' has no active members.
> COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE                #MEMBERS10.35.168.252:9092 (4)                              Empty                0
>
>
> bin/kafka-consumer-groups.sh  --bootstrap-server 10.35.168.252:9092 --describe --state  --group consumer-group
> Consumer group 'consumer-group' has no active members.
> COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE                #MEMBERS10.35.168.252:9092 (4)                              Empty                0
>
> bin/kafka-consumer-groups.sh  --bootstrap-server 10.33.21.48:9092 --describe --state  --group consumer-group
> Consumer group 'consumer-group' has no active members.
> COORDINATOR (ID)          ASSIGNMENT-STRATEGY       STATE                #MEMBERS10.35.168.252:9092 (4)                              Empty                0
>
>
> I can also see the same behaviour with other consumer groups as well. There are few consumer groups which are active in both mini clusters (not sure what should be the appropriate name in this case).
>
> The validations i have done
>
> 1. all the brokers are active and are able to talk to each other.
>
> 2. all the brokers have all other brokers listed when we run bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | awk '/^[a-z]/ {print $1}'
>
> 3. checked controller ip from zookeeper and validated there are no anomalies in controller logs of all the boxes.
>
> 4. I am able to reproduce the same issue right now by doing these steps for a new consumer group
>
>      a. start kafka consumer with group cg1 using brokerip 10.32.218.112:9092
>      b. validate that status using brokerip 10.32.218.112:9092 is showing consumer as live
>      c. validate that status using brokerip 10.35.168.252:9092 is showing consumer not live
>      d. start kafka consumer with group cg1 using brokerip 10.35.168.252:9092
>      e. validate that status using brokerip 10.32.218.112:9092 is showing consumer as live
>      f. validate that status using brokerip 10.35.168.252:9092 is showing consumer as live
>
>    but the consumer id both the brokers are reporting are different. Also when we stop both the consumers last read commit offset reported by both the brokers are different.
>       Confirming that both the consumers are treated separately.
>
> 5. the only suspicious log that i found in one of the borker is
>
>      WARN [2021-07-29 12:51:52,811] [kafka-request-handler-15][] state.change.logger - [Broker id=5] Ignoring LeaderAndIsr request from controller 1 with correlation id 2 epoch 5 for
>      partition __consumer_offsets-15 since its associated leader epoch 101 is not higher than the current leader epoch 101
>
>      There are quite a few of these logs for different partitions, and
> also similar failure logs in controller logs of controller.
>
>
> I have tried searching on stackoverflow and kafka jira but not able to
> find relevant issue. hence reaching out to you. Can you please help with
> this?
>
> Regards
> Maneesh Bhunwal
>
>
>