You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Miguel Angel Corral <mi...@mandiant.com.INVALID> on 2022/02/15 16:12:00 UTC

Leader: none in __consumer_offsets topic

Hi,

Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic __consumer_offsets became leaderless:

$ /kafka-topics.sh  --zookeeper <zookeeper_addresses>  --describe --under-replicated-partitions
                Topic: __consumer_offsets          Partition: 0          Leader: none      Replicas: 103,101,102    Isr:
                Topic: __consumer_offsets          Partition: 1          Leader: none      Replicas: 101,102,103    Isr:
                Topic: __consumer_offsets          Partition: 2          Leader: none      Replicas: 102,103,101    Isr:
                Topic: __consumer_offsets          Partition: 3          Leader: none      Replicas: 103,102,101    Isr:
                Topic: __consumer_offsets          Partition: 4          Leader: none      Replicas: 101,103,102    Isr:
                Topic: __consumer_offsets          Partition: 5          Leader: none      Replicas: 102,101,103    Isr:
                Topic: __consumer_offsets          Partition: 6          Leader: none      Replicas: 103,101,102    Isr:
                …

When this happened, consumers were unable to consume, with the following error:

o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2, groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102 rack: <region>)
o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2, groupId=foo] Received FindCoordinator response ClientResponse(receivedTimeMs=1639436595264, latencyMs=98, disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2, clientId=consumer-2, correlationId=117), responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15, errorMessage='The coordinator is not available.', nodeId=-1, host='', port=-1))
o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2, groupId=foo] Group coordinator lookup failed: The coordinator is not available.
o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2, groupId=foo] Coordinator discovery failed, refreshing metadata

This issue was solved just restarting all brokers without much investigation, since this caused an outage. Unfortunately, there’s no broker logs. During this incident, the JMX metrics kafka.controller:type=KafkaController,name=OfflinePartitionsCount and kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0.

I’m trying to figure out: 1. What could have caused this issue? 2. What JMX metrics could we use to get notified of this issue in the future?

Thanks in advance,
Miguel
This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

Re: Leader: none in __consumer_offsets topic

Posted by Miguel Angel Corral <mi...@mandiant.com.INVALID>.
Hi!

Yeah sorry that’s a typo, I meant 2.8.1

From: Luke Chen <sh...@gmail.com>
Date: Thursday, 17 February 2022 at 03:28
To: Kafka Users <us...@kafka.apache.org>
Subject: [EXTERNAL] Re: Leader: none in __consumer_offsets topic
CAUTION: This email originated from outside of Mandiant. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Miguel,

Could you let us know which version of Kafka you're using?
There's no v3.8.1 Kafka currently.

Thanks.
Luke

On Wed, Feb 16, 2022 at 12:12 AM Miguel Angel Corral
<mi...@mandiant.com.invalid> wrote:

> Hi,
>
> Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic
> __consumer_offsets became leaderless:
>
> $ /kafka-topics.sh  --zookeeper <zookeeper_addresses>  --describe
> --under-replicated-partitions
>                 Topic: __consumer_offsets          Partition: 0
> Leader: none      Replicas: 103,101,102    Isr:
>                 Topic: __consumer_offsets          Partition: 1
> Leader: none      Replicas: 101,102,103    Isr:
>                 Topic: __consumer_offsets          Partition: 2
> Leader: none      Replicas: 102,103,101    Isr:
>                 Topic: __consumer_offsets          Partition: 3
> Leader: none      Replicas: 103,102,101    Isr:
>                 Topic: __consumer_offsets          Partition: 4
> Leader: none      Replicas: 101,103,102    Isr:
>                 Topic: __consumer_offsets          Partition: 5
> Leader: none      Replicas: 102,101,103    Isr:
>                 Topic: __consumer_offsets          Partition: 6
> Leader: none      Replicas: 103,101,102    Isr:
>                 …
>
> When this happened, consumers were unable to consume, with the following
> error:
>
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102
> rack: <region>)
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Received FindCoordinator response
> ClientResponse(receivedTimeMs=1639436595264, latencyMs=98,
> disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2,
> clientId=consumer-2, correlationId=117),
> responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15,
> errorMessage='The coordinator is not available.', nodeId=-1, host='',
> port=-1))
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Group coordinator lookup failed: The coordinator is not
> available.
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Coordinator discovery failed, refreshing metadata
>
> This issue was solved just restarting all brokers without much
> investigation, since this caused an outage. Unfortunately, there’s no
> broker logs. During this incident, the JMX metrics
> kafka.controller:type=KafkaController,name=OfflinePartitionsCount and
> kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0.
>
> I’m trying to figure out: 1. What could have caused this issue? 2. What
> JMX metrics could we use to get notified of this issue in the future?
>
> Thanks in advance,
> Miguel
> This email and any attachments thereto may contain private, confidential,
> and/or privileged material for the sole use of the intended recipient. Any
> review, copying, or distribution of this email (or any attachments thereto)
> by others is strictly prohibited. If you are not the intended recipient,
> please contact the sender immediately and permanently delete the original
> and any copies of this email and any attachments thereto.
>
CAUTION: This email originated from outside of Mandiant from a third party. Please take extra precaution clicking on any embedded links or downloading and opening file attachments. If you feel this is a suspicious email, please use the ‘Report Phishing’ button in your Outlook toolbar.
This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

Re: [EXTERNAL] Re: Leader: none in __consumer_offsets topic

Posted by Miguel Angel Corral <mi...@mandiant.com.INVALID>.
Hi!

Yeah sorry that’s a typo, I meant 2.8.1

From: Luke Chen <sh...@gmail.com>
Date: Thursday, 17 February 2022 at 03:28
To: Kafka Users <us...@kafka.apache.org>
Subject: [EXTERNAL] Re: Leader: none in __consumer_offsets topic

Hi Miguel,

Could you let us know which version of Kafka you're using?
There's no v3.8.1 Kafka currently.

Thanks.
Luke

On Wed, Feb 16, 2022 at 12:12 AM Miguel Angel Corral
<mi...@mandiant.com.invalid> wrote:

> Hi,
>
> Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic
> __consumer_offsets became leaderless:
>
> $ /kafka-topics.sh  --zookeeper <zookeeper_addresses>  --describe
> --under-replicated-partitions
>                 Topic: __consumer_offsets          Partition: 0
> Leader: none      Replicas: 103,101,102    Isr:
>                 Topic: __consumer_offsets          Partition: 1
> Leader: none      Replicas: 101,102,103    Isr:
>                 Topic: __consumer_offsets          Partition: 2
> Leader: none      Replicas: 102,103,101    Isr:
>                 Topic: __consumer_offsets          Partition: 3
> Leader: none      Replicas: 103,102,101    Isr:
>                 Topic: __consumer_offsets          Partition: 4
> Leader: none      Replicas: 101,103,102    Isr:
>                 Topic: __consumer_offsets          Partition: 5
> Leader: none      Replicas: 102,101,103    Isr:
>                 Topic: __consumer_offsets          Partition: 6
> Leader: none      Replicas: 103,101,102    Isr:
>                 …
>
> When this happened, consumers were unable to consume, with the following
> error:
>
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102
> rack: <region>)
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Received FindCoordinator response
> ClientResponse(receivedTimeMs=1639436595264, latencyMs=98,
> disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2,
> clientId=consumer-2, correlationId=117),
> responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15,
> errorMessage='The coordinator is not available.', nodeId=-1, host='',
> port=-1))
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Group coordinator lookup failed: The coordinator is not
> available.
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Coordinator discovery failed, refreshing metadata
>
> This issue was solved just restarting all brokers without much
> investigation, since this caused an outage. Unfortunately, there’s no
> broker logs. During this incident, the JMX metrics
> kafka.controller:type=KafkaController,name=OfflinePartitionsCount and
> kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0.
>
> I’m trying to figure out: 1. What could have caused this issue? 2. What
> JMX metrics could we use to get notified of this issue in the future?
>
> Thanks in advance,
> Miguel
> This email and any attachments thereto may contain private, confidential,
> and/or privileged material for the sole use of the intended recipient. Any
> review, copying, or distribution of this email (or any attachments thereto)
> by others is strictly prohibited. If you are not the intended recipient,
> please contact the sender immediately and permanently delete the original
> and any copies of this email and any attachments thereto.
>
CAUTION: This email originated from outside of Mandiant from a third party. Please take extra precaution clicking on any embedded links or downloading and opening file attachments. If you feel this is a suspicious email, please use the ‘Report Phishing’ button in your Outlook toolbar.
This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

Re: Leader: none in __consumer_offsets topic

Posted by Luke Chen <sh...@gmail.com>.
Hi Miguel,

Could you let us know which version of Kafka you're using?
There's no v3.8.1 Kafka currently.

Thanks.
Luke

On Wed, Feb 16, 2022 at 12:12 AM Miguel Angel Corral
<mi...@mandiant.com.invalid> wrote:

> Hi,
>
> Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic
> __consumer_offsets became leaderless:
>
> $ /kafka-topics.sh  --zookeeper <zookeeper_addresses>  --describe
> --under-replicated-partitions
>                 Topic: __consumer_offsets          Partition: 0
> Leader: none      Replicas: 103,101,102    Isr:
>                 Topic: __consumer_offsets          Partition: 1
> Leader: none      Replicas: 101,102,103    Isr:
>                 Topic: __consumer_offsets          Partition: 2
> Leader: none      Replicas: 102,103,101    Isr:
>                 Topic: __consumer_offsets          Partition: 3
> Leader: none      Replicas: 103,102,101    Isr:
>                 Topic: __consumer_offsets          Partition: 4
> Leader: none      Replicas: 101,103,102    Isr:
>                 Topic: __consumer_offsets          Partition: 5
> Leader: none      Replicas: 102,101,103    Isr:
>                 Topic: __consumer_offsets          Partition: 6
> Leader: none      Replicas: 103,101,102    Isr:
>                 …
>
> When this happened, consumers were unable to consume, with the following
> error:
>
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102
> rack: <region>)
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Received FindCoordinator response
> ClientResponse(receivedTimeMs=1639436595264, latencyMs=98,
> disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2,
> clientId=consumer-2, correlationId=117),
> responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15,
> errorMessage='The coordinator is not available.', nodeId=-1, host='',
> port=-1))
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Group coordinator lookup failed: The coordinator is not
> available.
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Coordinator discovery failed, refreshing metadata
>
> This issue was solved just restarting all brokers without much
> investigation, since this caused an outage. Unfortunately, there’s no
> broker logs. During this incident, the JMX metrics
> kafka.controller:type=KafkaController,name=OfflinePartitionsCount and
> kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0.
>
> I’m trying to figure out: 1. What could have caused this issue? 2. What
> JMX metrics could we use to get notified of this issue in the future?
>
> Thanks in advance,
> Miguel
> This email and any attachments thereto may contain private, confidential,
> and/or privileged material for the sole use of the intended recipient. Any
> review, copying, or distribution of this email (or any attachments thereto)
> by others is strictly prohibited. If you are not the intended recipient,
> please contact the sender immediately and permanently delete the original
> and any copies of this email and any attachments thereto.
>