You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "shylaja kokoori (Jira)" <ji...@apache.org> on 2021/12/23 00:31:00 UTC

[jira] [Commented] (KAFKA-13418) Brokers disconnect intermittently with TLS1.3

    [ https://issues.apache.org/jira/browse/KAFKA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464190#comment-17464190 ] 

shylaja kokoori commented on KAFKA-13418:
-----------------------------------------

After enabling SSL logging (javax.net.debug=ssl,handshake),
I see that unwrap call in the SslTransportLayer.read function returns handshakeStatus=NEED_WRAP when ssl key_update takes place. (log snippet below)

Based on documentation provided in [https://datatracker.ietf.org/doc/html/rfc8446]
key_updates normally happen during a read/write and connection has to be closed when it happens during handshake. 
Given that here key_updates are happening after handshaking is done, will something like attached patch work? I am new to Kafka and any feedback would be helpful.

Kafka log:
{code:java}
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.574 UTC|KeyUpdate.java:192|Consuming KeyUpdate post-handshake message (
"KeyUpdate": {
  "request_update": update_requested
}
)
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.575 UTC|SSLCipher.java:1866|KeyLimit read side: algorithm = AES/GCM/NOPADDING:KEYUPDATE
countdown value = 137438953472
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.575 UTC|KeyUpdate.java:236|KeyUpdate: read key updated
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.575 UTC|KeyUpdate.java:271|Produced KeyUpdate post-handshake message (
"KeyUpdate": {
  "request_update": update_not_requested
}
)
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.575 UTC|SSLCipher.java:2020|KeyLimit write side: algorithm = AES/GCM/NOPADDING:KEYUPDATE
countdown value = 137438953472
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.575 UTC|KeyUpdate.java:323|KeyUpdate: write key updated
[2021-12-21 06:14:09,575] ERROR [SslTransportLayer channelId=2 key=channel=java.nio.channels.SocketChannel[connection-pending remote=/192.168.24.11:9093], selector=sun.nio.ch.EPollSelectorImpl@2eb1a872, interestOps=8, readyOps=0] Renegotiation requested, but it is not supported, channelId 2, appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 147 handshakeStatus NEED_WRAP State READY (org.apache.kafka.common.network.SslTransportLayer)
javax.net.ssl|DEBUG|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.578 UTC|Alert.java:238|Received alert message (
"Alert": {
  "level"      : "warning",
  "description": "close_notify"
}
)
javax.net.ssl|ALL|8D|ReplicaFetcherThread-0-2|2021-12-21 06:14:09.580 UTC|SSLEngineImpl.java:752|Closing outbound of SSLEngine{code}

> Brokers disconnect intermittently with TLS1.3
> ---------------------------------------------
>
>                 Key: KAFKA-13418
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13418
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.8.0
>            Reporter: shylaja kokoori
>            Assignee: shylaja kokoori
>            Priority: Minor
>         Attachments: tls1_3.patch
>
>
> Using TLS1.3 (with JDK11) is causing a regression and an increase in inter-broker p99 latency, as mentioned by Yiming in [Kafka-9320|https://issues.apache.org/jira/browse/KAFKA-9320?focusedCommentId=17401818&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17401818]. We tested this with Kafka 2.8.
> The issue seems to be because of a renegotiation exception being thrown by 
> {code:java}
> read(ByteBuffer dst)
> {code}
>  & 
> {code:java}
> write(ByteBuffer src)
> {code}
>  in 
> _clients/src/main/java/org/apache/kafka/common/network/SslTransportLayer.java_
> This exception is causing the connection to close between the brokers before read/write is completed. In our internal experiments we have seen the p99 latency stabilize when we remove this exception.
> Given that TLS1.3 does not support renegotiation, I would like to make it applicable just for TLS1.2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)