You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Henry Cai (JIRA)" <ji...@apache.org> on 2019/03/11 06:35:00 UTC

[jira] [Commented] (KAFKA-8089) High level consumer from MirrorMaker is slow to deal with SSL certification expiration

    [ https://issues.apache.org/jira/browse/KAFKA-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789201#comment-16789201 ] 

Henry Cai commented on KAFKA-8089:
----------------------------------

I looked at the latest Kafka 2.2.0 code, I don't see any change in NetworkClient.java in the related area, I believe the problem also exists there.

> High level consumer from MirrorMaker is slow to deal with SSL certification expiration
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8089
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8089
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 2.0.0
>            Reporter: Henry Cai
>            Priority: Critical
>
> We have been using Kafka 2.0's mirror maker (which used High level consumer) to do replication.  The topic is SSL enabled and the certificate will expire at a random time within 12 hours.  When the certificate expired we will see many SSL related exception in the log
>  
> [2019-03-07 18:02:54,128] ERROR [Consumer clientId=kafkamirror-euw1-use1-m10nkafka03-1, groupId=kafkamirror-euw1-use1-m10nkafka03] Connection to node 3005 failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)
> This error will repeat for several hours.
> However even with the SSL error, the preexisting socket connection will still work so the main fetching activities is actually not affected, but the metadata operations from the client and the heartbeats from heartbeat thread will be affected since they might open new socket connections.  I think those errors are most likely originated from those side activities.
> The situation will last several hours until the main fetcher thread tried to open a new connection (usually due to consumer rebalance) and then the SSL Authentication exception will abort the operation and mirror maker will exit.
> During that several hours, the client wouldn't be able to get the latest metadata and heartbeats also falters (we see rebalancing triggered because of this).
> In NetworkClient.processDisconnection(), when the above method prints the ERROR message, can it just throw the AuthenticationException up, this will kill the KafkaConsumer.poll(), and this will speedup the certificate recycle (in our case, we will restart the mirror maker with the new certificate)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)