You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Omnia Ibrahim (Jira)" <ji...@apache.org> on 2021/05/27 14:27:00 UTC

[jira] [Comment Edited] (KAFKA-12465) Decide whether inconsistent cluster id error are fatal

    [ https://issues.apache.org/jira/browse/KAFKA-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352518#comment-17352518 ] 

Omnia Ibrahim edited comment on KAFKA-12465 at 5/27/21, 2:26 PM:
-----------------------------------------------------------------

I have been testing KRAFT and I was trying this scenario where I setup a cluster with 3 combined nodes (broker, controller) and 3 nodes as brokers then later at some point I add an extra 2 nodes to the KRAFT with different cluster id. I would expect if this is a really deployment on production then these 2 nodes with wrong cluster id should crash immediately so we can tell that something is wrong during the deployment. 

The scenario I was testing is the following:
 * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} to become the leader.
 *  Added an extra 2 nodes later to the raft with different cluster id {{WRONG_CLUSTER_ID}}
 * The the extra nodes don't crash however it stay in running mode and keep throw error
{code:java}
 {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, responses=[]), sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code}

 * {{raft-node-1}} don't throw errors, only warning for connection issues connection

{code:java}
{"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node raft-node-4:9093 (id: 8 rack: null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code}
If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be fatel at all time, otherwise how can we tell if these nodes is failing to join the active raft quourm? 


was (Author: omnia_h_ibrahim):
I have been testing KRAFT and I was trying this scenario where I setup a cluster with 3 combined nodes (broker, controller) and 3 nodes as brokers then later at some point I add an extra 2 nodes to the KRAFT with different cluster id. I would expect if this is a really deployment on production then these 2 nodes with wrong cluster id should crash immediately so we can tell that something is wrong during the deployment. 


The scenario I was testing is the following:
 * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} to become the leader.
 *  Added an extra 2 nodes later to the raft with different cluster id {{WRONG_CLUSTER_ID}}


 * The the extra nodes don't crash however it stay in running mode and keep throw error
{code:java}
 {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, responses=[]), sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code}
{{}}
 * {{raft-node-1}} don't throw errors, only warning for connection issues connection 

{code:java}
{"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node raft-node-4:9093 (id: 8 rack: null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code}

If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be fatel at all time, otherwise how can we tell if these nodes is failing to join the active raft quourm? 

> Decide whether inconsistent cluster id error are fatal
> ------------------------------------------------------
>
>                 Key: KAFKA-12465
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12465
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: dengziming
>            Priority: Major
>
> Currently, we just log an error when an inconsistent cluster-id occurred. We should set a window during startup when these errors are fatal but after that window, we no longer treat them to be fatal. see https://github.com/apache/kafka/pull/10289#discussion_r592853088



--
This message was sent by Atlassian Jira
(v8.3.4#803005)