You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Dai Ma (Jira)" <ji...@apache.org> on 2023/05/28 14:22:00 UTC

[jira] [Created] (KAFKA-15032) kafka clients continuously connecting to the broken broker and can not switch to the good one

Dai Ma created KAFKA-15032:
------------------------------

             Summary: kafka clients continuously connecting to the broken broker and can not switch to the good one
                 Key: KAFKA-15032
                 URL: https://issues.apache.org/jira/browse/KAFKA-15032
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 2.8.1
            Reporter: Dai Ma


* Kafka Cluster: 192.168.1.2:9092(1001), 192.168.1.3:9092(1002)
 * Kafka Topic Config: 2 partition, 2 replicas
 * Kafka Client Config: bootstrap.servers: 192.168.1.2:9092,192.168.1.3:9092
 * Operation: stop 1001 then start 1001, stop 1002 and do not start
 ** when 1001 stop, client is normal
 ** when 1001 start and 1002 stop, Kafka Client is keep connecting to 1002

{code:java}
[2023-05-16 23:27:02,450] WARN [Consumer clientId=consumer-123-1, groupId=123] Connection to node 1002 (192.168.1.3:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-05-16 23:27:03,004] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Give up sending metadata request since no node is available (org.apache.kafka.clients.NetworkClient)
[2023-05-16 23:27:03,054] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Give up sending metadata request since no node is available (org.apache.kafka.clients.NetworkClient)
[2023-05-16 23:27:03,557] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Initialize connection to node 192.168.1.3:9092 (id: 1002 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient)
[2023-05-16 23:27:03,557] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Initiating connection to node 192.168.1.3:9092 (id: 1002 rack: null) using address /192.168.1.3 (org.apache.kafka.clients.NetworkClient)
[2023-05-16 23:27:03,559] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Set SASL client state to SEND_APIVERSIONS_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2023-05-16 23:27:03,559] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Creating SaslClient: client=null;service=kafka;serviceHostname=192.168.1.3;mechs=[PLAIN] (org.apache.kafka.common.security.authenticator.SaslClientAuthentica
[2023-05-16 23:27:03,560] DEBUG [Consumer clientId=consumer-123-1, groupId=123] Connection with 192.168.1.3 disconnected (org.apache.kafka.common.network.Selector) {code}
I have made a preliminary diagnosis of the issue:
 # Kafka Clients use bootstrap server: 1001, 1002, this is only for init connection
 # When 1001 stop, topic partition leader is  all in 1002, Kafka Clients refresh metadata and remove 1001 from available node list,client will only connet 1002
 # When 1001 start, topic partition leader is  still all in 1002;
 # When 1002 stop topic partition leader is swithed to 1001,but kafka Clients don't know this information,will still connect 1002。

I tried to change metadata.max.age.ms to 1000ms, and it can avoid this problem to some extent, but I’m worried that if all the clients configure it this way, it will put some pressure on Kafka Broker.

I think that when the 1002 node cannot connect for some time, the Kafka client should automatically try to reconnect using bootstrap.servers. When it connects to the 1001 node, the client will work normally. Because there is still one broker alive, a two-node cluster should tolerate one broker shutdown.

 

I started a discussion on StackOverflow:[Consecutively restarting two Kafka Brokers., if the second node fails to start, the client continuously connecting to the second node - Stack Overflow|https://stackoverflow.com/questions/76345056/consecutively-restarting-two-kafka-brokers-if-the-second-node-fails-to-start]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)