You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Michael Kemmerer (JIRA)" <ji...@apache.org> on 2019/03/04 16:38:00 UTC

[jira] [Commented] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

    [ https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783530#comment-16783530 ] 

Michael Kemmerer commented on KAFKA-8008:
-----------------------------------------

Same here. I have 3 clusters on the same V-Net in Azure. All of them were deployed at the same time using similar configurations, only one is experiencing this issue. I've even torn down the brokers of the affected cluster and rebuilt them without using SASL and they're still experiencing the same error state.

> Clients unable to connect and replicas are not able to connect to each other
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-8008
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8008
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, core
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Java 11
>            Reporter: Abhi
>            Priority: Critical
>
> Hi,
> I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the servers. The kafka-network-thread-1-ListenerName are all consuming full cpu cycles. Lots of TCP connections are in CLOSE_WAIT state.
> My broker setup is using kerberos authentication with -Dsun.security.jgss.native=true.
> I am not sure how to handle this? Will increasing the kafka-network thread count help if it is possible?
> Does this seem like a bug? I am happy to help in anyway I can as this issue blocking our production usage and would like to get it resolved as early as possible.
> *server.log snippet from one of the servers:*
> [2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 2 with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 (id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to SEND_APIVERSIONS_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG Creating SaslClient: client=kafka/mwkafka-prod-01.nyc.foo.com@UNIX.FOO.COM;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI] (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to RECEIVE_APIVERSIONS_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=3] Completed connection to node 2. Ready. (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) to node 5: java.net.SocketTimeoutException: Failed to connect within 30000 ms. (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Error in response for fetch request (type=FetchRequest, replicaId=1, maxWait=10000, minBytes=1, maxBytes=10485760, fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-newtp-5-64-0=(fetchOffset=8936, logStartOffset=8936, maxBytes=1048576, currentLeaderEpoch=Optional[18]), reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, maxBytes=1048576, currentLeaderEpoch=Optional[19]), reddyvel-tp3-58-0=(fetchOffset=264495, logStartOffset=264495, maxBytes=1048576, currentLeaderEpoch=Optional[19]), fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, logStartOffset=2, maxBytes=1048576, currentLeaderEpoch=Optional[6]), reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-988-0=(fetchOffset=3173185, logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-409-0=(fetchOffset=3173194, logStartOffset=3173194, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.priveq.reins-0=(fetchOffset=12, logStartOffset=6, maxBytes=1048576, currentLeaderEpoch=Optional[5]), reddyvel-353-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-tp10-63-0=(fetchOffset=220652, logStartOffset=220652, maxBytes=1048576, currentLeaderEpoch=Optional[19]), reddyvel-newtp-5-86-0=(fetchOffset=8935, logStartOffset=8935, maxBytes=1048576, currentLeaderEpoch=Optional[18]), reddyvel-878-0=(fetchOffset=3173187, logStartOffset=3173187, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-621-0=(fetchOffset=3173190, logStartOffset=3173190, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.agg.uopt.opt-0=(fetchOffset=28297, logStartOffset=28297, maxBytes=1048576, currentLeaderEpoch=Optional[8]), reddyvel-661-0=(fetchOffset=3173190, logStartOffset=3173190, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.guas.jpeq.fe_marb-0=(fetchOffset=532, logStartOffset=10, maxBytes=1048576, currentLeaderEpoch=Optional[3]), reddyvel-607-0=(fetchOffset=3173191, logStartOffset=3173191, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.seed.ornt.desim_ornt-0=(fetchOffset=4060, logStartOffset=2433, maxBytes=1048576, currentLeaderEpoch=Optional[5]), reddyvel-962-0=(fetchOffset=3173185, logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), fps.agg.ornt.desim_ornt-0=(fetchOffset=1177, logStartOffset=1177, maxBytes=1048576, currentLeaderEpoch=Optional[8]), reddyvel-tp6-71-0=(fetchOffset=256309, logStartOffset=256309, maxBytes=1048576, currentLeaderEpoch=Optional[19]), fps.trsy.macro-0=(fetchOffset=324, logStartOffset=106, maxBytes=1048576, currentLeaderEpoch=Optional[3]), fps.agg.dist.treas-0=(fetchOffset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[8]), reddyvel-newtp-8-111-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, currentLeaderEpoch=Optional[18]), reddyvel-94-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-544-0=(fetchOffset=3173192, logStartOffset=3173192, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-newtp-2-72-0=(fetchOffset=8679, logStartOffset=8679, maxBytes=1048576, currentLeaderEpoch=Optional[18]), fps.guas.useq.eq_stat_arb_useq-0=(fetchOffset=6071629, logStartOffset=4470145, maxBytes=1048576, currentLeaderEpoch=Optional[4]), reddyvel-newtp-10-75-0=(fetchOffset=9768, logStartOffset=9768, maxBytes=1048576, currentLeaderEpoch=Optional[18]), reddyvel-tp8-51-0=(fetchOffset=249873, logStartOffset=249873, maxBytes=1048576, currentLeaderEpoch=Optional[19]), reddyvel-newtp-8-122-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, currentLeaderEpoch=Optional[18]), fps.seed.trsy.pe_china-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[3]), fps.seed.trsy.jcas-0=(fetchOffset=93, logStartOffset=31, maxBytes=1048576, currentLeaderEpoch=Optional[3]), reddyvel-tp8-99-0=(fetchOffset=249871, logStartOffset=249871, maxBytes=1048576, currentLeaderEpoch=Optional[19]), reddyvel-643-0=(fetchOffset=3173190, logStartOffset=3173190, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-577-0=(fetchOffset=3173191, logStartOffset=3173191, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-tp6-97-0=(fetchOffset=256307, logStartOffset=256307, maxBytes=1048576, currentLeaderEpoch=Optional[19]), reddyvel-newtp-6-72-0=(fetchOffset=7652, logStartOffset=7652, maxBytes=1048576, currentLeaderEpoch=Optional[18]), reddyvel-959-0=(fetchOffset=3173185, logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), reddyvel-newtp-4-70-0=(fetchOffset=8828, logStartOffset=8828, maxBytes=1048576, currentLeaderEpoch=Optional[18]), fps.seed.trsy.opt-0=(fetchOffset=114, logStartOffset=38, maxBytes=1048576, currentLeaderEpoch=Optional[4])}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=397037945, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:95)
>         at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:192)
>         at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:274)
>         at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>         at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>         at scala.Option.foreach(Option.scala:257)
>         at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>         at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 with 45 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:04,008] DEBUG Set SASL client state to SEND_APIVERSIONS_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG Creating SaslClient: client=kafka/mwkafka-prod-01.nyc.foo.com@UNIX.FOO.COM;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI] (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:04,008] DEBUG Set SASL client state to RECEIVE_APIVERSIONS_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0] Completed connection to node 5. Ready. (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:04,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 with 48 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:05,318] DEBUG Set SASL client state to SEND_APIVERSIONS_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,318] DEBUG Creating SaslClient: client=kafka/mwkafka-prod-01.nyc.foo.com@UNIX.FOO.COM;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI] (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:05,319] DEBUG Set SASL client state to RECEIVE_APIVERSIONS_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=1] Completed connection to node 5. Ready. (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:05,477] DEBUG Set SASL server state to AUTHENTICATE (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG [SocketServer brokerId=1] Connection with /10.236.30.31 disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>         at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
>         at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:237)
>         at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:132)
>         at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:532)
>         at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
>         at kafka.network.Processor.poll(SocketServer.scala:689)
>         at kafka.network.Processor.run(SocketServer.scala:594)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> [2019-02-27 00:00:05,477] DEBUG Handling Kafka request SASL_HANDSHAKE (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG Using SASL mechanism 'GSSAPI' provided by client (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG Creating SaslServer for kafka/mwkafka-prod-01.nyc.foo.com@UNIX.FOO.COM with mechanism GSSAPI (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:06,056] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=2] Error sending fetch request (sessionId=373847113, epoch=INITIAL) to node 4: java.net.SocketTimeoutException: Failed to connect within 30000 ms. (org.apache.kafka.clients.FetchSessionHandler)
> *Lsof output:*
> kafkagod@mwkafka-prod-01.nyc[toa]:/local/kafka/logs> lsof -P -p 103485 | grep TCP | grep CLOSE
> java    103485 kafkagod  635u     IPv4           86522305       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:46014 (CLOSE_WAIT)
> java    103485 kafkagod  639u     IPv4           86519040       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45926 (CLOSE_WAIT)
> java    103485 kafkagod  642u     IPv4           86519057       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47424 (CLOSE_WAIT)
> java    103485 kafkagod  643u     IPv4           86519058       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47428 (CLOSE_WAIT)
> java    103485 kafkagod  683u     IPv4           86509505       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57856 (CLOSE_WAIT)
> java    103485 kafkagod  684u     IPv4           86522910       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57894 (CLOSE_WAIT)
> java    103485 kafkagod  688u     IPv4           86522176       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45966 (CLOSE_WAIT)
> java    103485 kafkagod  690u     IPv4           86522306       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35326 (CLOSE_WAIT)
> java    103485 kafkagod  695u     IPv4           86522192       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45968 (CLOSE_WAIT)
> java    103485 kafkagod  696u     IPv4           86509516       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.dr.foo.com:39676 (CLOSE_WAIT)
> java    103485 kafkagod  697u     IPv4           86522307       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35328 (CLOSE_WAIT)
> java    103485 kafkagod  705u     IPv4           86519026       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57838 (CLOSE_WAIT)
> java    103485 kafkagod  726u     IPv4           86509517       0t0       TCP mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35258 (CLOSE_WAIT)
> *contoller logs below show that no broker is able connect to each other:*
> [2019-02-27 03:39:16,135] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-02.dr.foo.com:9092 (id: 4 rack: dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:20,915] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-02.tbd.foo.com:9092 (id: 6 rack: tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:25,839] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-01.dr.foo.com:9092 (id: 3 rack: dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:31,371] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-01.tbd.foo.com:9092 (id: 5 rack: tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:40,440] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-01.nyc.foo.com:9092 (id: 1 rack: nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:42,149] WARN [RequestSendThread controllerId=1] Controller 1's connection to broker mwkafka-prod-02.nyc.foo.com:9092 (id: 2 rack: nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> *Client consumer logs: consumer is not able to connect to the brokers*
> [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -6 (org.apache.kafka.common.network.Selector)
> [2019-02-27 03:37:37,587] DEBUG Set SASL client state to RECEIVE_APIVERSIONS_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] Completed connection to node -6. Fetching API versions. (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 (id: -6 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:08,618] DEBUG Set SASL client state to INITIAL (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:08,619] DEBUG Set SASL client state to INTERMEDIATE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:29,288] DEBUG Set SASL client state to SEND_HANDSHAKE_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:29,288] DEBUG Set SASL client state to RECEIVE_HANDSHAKE_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 (id: -4 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 (id: -3 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-01.nyc.deshaw.com:9092 (id: -1 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:33,050] DEBUG Set SASL client state to SEND_HANDSHAKE_REQUEST (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:33,051] DEBUG Set SASL client state to RECEIVE_HANDSHAKE_RESPONSE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:34,683] DEBUG Set SASL client state to CLIENT_COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:41,887] DEBUG Set SASL client state to CLIENT_COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 (id: -6 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:31,269] DEBUG Set SASL client state to COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:40:31,269] DEBUG [Consumer clientId=test_con, groupId=chow] Initiating API versions fetch from node -1. (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 (id: -4 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:42,438] DEBUG Set SASL client state to COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)