You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/11/15 17:40:37 UTC

[GitHub] [kafka] rondagostino opened a new pull request, #12856: KAFKA-14392: KRaft broker should set controller.socket.timeout.ms

rondagostino opened a new pull request, #12856:
URL: https://github.com/apache/kafka/pull/12856

   KRaft brokers maintain their liveness in the cluster by sending BROKER_HEARTBEAT requests to the active controller; the active controller fences a broker if it doesn't receive a heartbeat request from that broker within the period defined by `broker.session.timeout.ms`. The broker should use a request timeout for its BROKER_HEARTBEAT requests that is not larger than the session timeout being used by the controller; doing so creates the possibility that upon controller failover the broker might fail to cancel an existing heartbeat request in time and then subsequently heartbeat to the new controller to maintain an uninterrupted session in the cluster. In other words, a failure of the active controller could result in under-replicated (or under-min ISR) partitions simply due to a delay in brokers heartbeating to the new controller.
   
   This patch adds documentation to that effect and sets the `controller.socket.timeout.ms` config accordingly in the quickstart files.  It also makes a change in `BrokerToControllerChannelManager.scala` to set the default request timeout to be equal to the value of `controller.socket.timeout.ms` rather than the generic `request.timeout.ms` -- but this default timeout value is not used by the BrokerToControllerChannelManager functionality, so this change is simply cosmetic at this time.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1029683189


##########
core/src/main/scala/kafka/server/AlterPartitionManager.scala:
##########
@@ -91,7 +91,8 @@ object AlterPartitionManager {
       config = config,
       channelName = "alterPartition",
       threadNamePrefix = threadNamePrefix,
-      retryTimeoutMs = Long.MaxValue
+      networkClientRetryTimeoutMs = if (config.processRoles.isEmpty) config.controllerSocketTimeoutMs else config.brokerSessionTimeoutMs / 2,

Review Comment:
   I believe https://issues.apache.org/jira/browse/KAFKA-14394 can now be closed.  We do want infinite retries on the broker side for AlterPartition requests as per https://github.com/apache/kafka/pull/9564#discussion_r527326231.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] jsancio commented on a diff in pull request #12856: KAFKA-14392: KRaft broker should set controller.socket.timeout.ms

Posted by GitBox <gi...@apache.org>.
jsancio commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023111417


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +211,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        config.controllerSocketTimeoutMs,

Review Comment:
   I see. The broker is sending the correct request timeout to the controller but it is not being honor on the client side.



##########
config/kraft/broker.properties:
##########
@@ -53,6 +53,9 @@ controller.listener.names=CONTROLLER
 # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
 listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
 
+# The controller socket timeout should not exceed the broker session timeout being used by the controller
+controller.socket.timeout.ms=9000

Review Comment:
   This is the actual fix. Shouldn't Kafka change the default? This is a problem with every cluster if they don't change the controller socket timeout or the broker session timeout. Right now the default broker session timeout is 9 seconds while the socket timeout is 30 seconds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028622710


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -121,16 +121,18 @@ object BrokerToControllerChannelManager {
     config: KafkaConfig,
     channelName: String,
     threadNamePrefix: Option[String],
-    retryTimeoutMs: Long
-  ): BrokerToControllerChannelManager = {
+    retryTimeoutMs: Long,
+    networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   looking at this again, I think we should just rename this to `requestTimeoutMs` and use it at the `NetworkManager` layer, as well as here.



##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -282,8 +286,9 @@ class BrokerToControllerRequestThread(
   config: KafkaConfig,
   time: Time,
   threadName: String,
-  retryTimeoutMs: Long
-) extends InterBrokerSendThread(threadName, networkClient, config.controllerSocketTimeoutMs, time, isInterruptible = false) {
+  retryTimeoutMs: Long,
+  networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044687611


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -283,7 +283,7 @@ class BrokerToControllerRequestThread(
   time: Time,
   threadName: String,
   retryTimeoutMs: Long
-) extends InterBrokerSendThread(threadName, networkClient, config.controllerSocketTimeoutMs, time, isInterruptible = false) {
+) extends InterBrokerSendThread(threadName, networkClient, Math.min(Int.MaxValue, Math.min(config.controllerSocketTimeoutMs, retryTimeoutMs)).toInt, time, isInterruptible = false) {

Review Comment:
   let's just use `retryTimeoutMs`, there is no need to involve `controllerSocketTimeoutMs`. That is not a KRaft config anyway...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023303714


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -319,7 +345,7 @@ class BrokerToControllerRequestThread(
     val requestIter = requestQueue.iterator()
     while (requestIter.hasNext) {
       val request = requestIter.next
-      if (currentTimeMs - request.createdTimeMs >= retryTimeoutMs) {
+      if (currentTimeMs - request.createdTimeMs >= requestThreadRetryTimeoutMs) {

Review Comment:
   This is the only place this particular retry timeout is used.  It is not clear to me how the semantics of this timeout and the network client timeout differ and why the values are used differently (specifically, AlterParticionManager currently sets Long.MAX_VALUE for this timeout but uses `controller.socket.timeout.ms` for the network client timeout).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044686357


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -221,7 +221,7 @@ class BrokerServer(
         config,
         channelName = "forwarding",
         threadNamePrefix,
-        retryTimeoutMs = 60000
+        retryTimeoutMs = config.brokerSessionTimeoutMs / 2

Review Comment:
   let's leave this alone because it doesn't send heartbeats



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on pull request #12856: KAFKA-14392: KRaft broker should set controller.socket.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on PR #12856:
URL: https://github.com/apache/kafka/pull/12856#issuecomment-1315706091

   `controller.socket.timeout.ms` is not really the right thing here: that is used by the old ZK based controller to control timeouts for its outbound requests.
   
   In the short term, I'd suggest setting the request timeout for these send-to-broker threads to half of `broker.session.timeout.ms`. That will at least be a reasonable default here.
   
   Long term we might need a KIP to define this better. It probably makes sense as a tunable but we'd have to discuss that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619193


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -221,7 +221,8 @@ class BrokerServer(
         config,
         channelName = "forwarding",
         threadNamePrefix,
-        retryTimeoutMs = 60000
+        retryTimeoutMs = config.brokerSessionTimeoutMs / 2,
+        networkClientRetryTimeoutMs = Some(config.brokerSessionTimeoutMs / 2)

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044929305


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +211,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        Math.min(Int.MaxValue, Math.min(config.controllerSocketTimeoutMs, retryTimeoutMs)).toInt, // request timeout should not exceed the provided retry timeout

Review Comment:
   ok, that's fair.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619818


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -282,8 +286,9 @@ class BrokerToControllerRequestThread(
   config: KafkaConfig,
   time: Time,
   threadName: String,
-  retryTimeoutMs: Long
-) extends InterBrokerSendThread(threadName, networkClient, config.controllerSocketTimeoutMs, time, isInterruptible = false) {
+  retryTimeoutMs: Long,
+  networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on PR #12856:
URL: https://github.com/apache/kafka/pull/12856#issuecomment-1315975187

   Thanks again for tackling this, @rondagostino . I think this change is getting a bit too big, though. I'd really suggest just overriding `request.timeout.ms` in the specific case where we're creating a `BrokerToControllerChannelManager` for the `BrokerLifecycleManager`. Let's leave the other stuff alone for now -- does that make sense?
   
   Also it should be possible to avoid creating an `apply` method for `BrokerToControllerChannelManager`. Just have an extra parameter and make it default to `conf.requestTimeoutMs`, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023302354


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +235,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        networkClientRetryTimeoutMs,

Review Comment:
   This change is cosmetic only -- the value is never used by this code as currently implemented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619470


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -344,7 +345,8 @@ class BrokerServer(
         config,
         "heartbeat",
         threadNamePrefix,
-        config.brokerSessionTimeoutMs.toLong
+        config.brokerSessionTimeoutMs / 2,

Review Comment:
   this can stay at `brokerSessionTimeout`, not `brokerSessionTimeout / 2`



##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -221,7 +221,8 @@ class BrokerServer(
         config,
         channelName = "forwarding",
         threadNamePrefix,
-        retryTimeoutMs = 60000
+        retryTimeoutMs = config.brokerSessionTimeoutMs / 2,
+        networkClientRetryTimeoutMs = Some(config.brokerSessionTimeoutMs / 2)

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044681192


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -221,7 +221,7 @@ class BrokerServer(
         config,
         channelName = "forwarding",
         threadNamePrefix,
-        retryTimeoutMs = 60000
+        retryTimeoutMs = config.brokerSessionTimeoutMs / 2

Review Comment:
   why are we changing this one, given that this BrokerToControllerChannelManager does not send heartbeats?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044711404


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +211,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        Math.min(Int.MaxValue, Math.min(config.controllerSocketTimeoutMs, retryTimeoutMs)).toInt, // request timeout should not exceed the provided retry timeout

Review Comment:
   I don't think we can use the retry timeout directly.  AlterPartitionManager sets [retryTimeoutMs = Long.MaxValue](https://github.com/apache/kafka/blob/42cfd57a2498b1a997e77cc24ec38b3722cc1f6a/core/src/main/scala/kafka/server/AlterPartitionManager.scala#L94).  I believe we have to set a reasonable request timeout -- something that doesn't exceed the retry timeout, but also something that isn't effectively forever.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe merged pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe merged PR #12856:
URL: https://github.com/apache/kafka/pull/12856


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044681192


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -221,7 +221,7 @@ class BrokerServer(
         config,
         channelName = "forwarding",
         threadNamePrefix,
-        retryTimeoutMs = 60000
+        retryTimeoutMs = config.brokerSessionTimeoutMs / 2

Review Comment:
   why are we changing this one, given that this BrokerToControllerChannelManager does not send heartbeats?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619578


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -344,7 +345,8 @@ class BrokerServer(
         config,
         "heartbeat",
         threadNamePrefix,
-        config.brokerSessionTimeoutMs.toLong
+        config.brokerSessionTimeoutMs / 2,
+        Some(config.brokerSessionTimeoutMs / 2)

Review Comment:
   Can you add a comment referencing this JIRA number



##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -159,7 +161,8 @@ class BrokerToControllerChannelManagerImpl(
   config: KafkaConfig,
   channelName: String,
   threadNamePrefix: Option[String],
-  retryTimeoutMs: Long
+  retryTimeoutMs: Long,
+  networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1024149966


##########
core/src/main/scala/kafka/server/AlterPartitionManager.scala:
##########
@@ -91,7 +91,8 @@ object AlterPartitionManager {
       config = config,
       channelName = "alterPartition",
       threadNamePrefix = threadNamePrefix,
-      retryTimeoutMs = Long.MaxValue
+      networkClientRetryTimeoutMs = if (config.processRoles.isEmpty) config.controllerSocketTimeoutMs else config.brokerSessionTimeoutMs / 2,

Review Comment:
   > let's leave this one alone for now and file a follow-up JIRA for it
   
   I've opened https://issues.apache.org/jira/browse/KAFKA-14394 to track this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker should set controller.socket.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023103733


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +211,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        config.controllerSocketTimeoutMs,

Review Comment:
   This change is cosmetic right now.  `BrokerToControllerChannelManager` does not leverage this default value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1044686915


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +211,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        Math.min(Int.MaxValue, Math.min(config.controllerSocketTimeoutMs, retryTimeoutMs)).toInt, // request timeout should not exceed the provided retry timeout

Review Comment:
   let's just use `retryTimeoutMs`, there is no need to involve `controllerSocketTimeoutMs`. That is not a KRaft config anyway...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] jsancio commented on pull request #12856: KAFKA-14392: KRaft broker should set controller.socket.timeout.ms

Posted by GitBox <gi...@apache.org>.
jsancio commented on PR #12856:
URL: https://github.com/apache/kafka/pull/12856#issuecomment-1315702080

   @rondagostino should we include this fix in 3.3.2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023331930


##########
core/src/main/scala/kafka/server/AlterPartitionManager.scala:
##########
@@ -91,7 +91,8 @@ object AlterPartitionManager {
       config = config,
       channelName = "alterPartition",
       threadNamePrefix = threadNamePrefix,
-      retryTimeoutMs = Long.MaxValue
+      networkClientRetryTimeoutMs = if (config.processRoles.isEmpty) config.controllerSocketTimeoutMs else config.brokerSessionTimeoutMs / 2,

Review Comment:
   let's leave this one alone for now and file a follow-up JIRA for it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1024154049


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -121,16 +121,18 @@ object BrokerToControllerChannelManager {
     config: KafkaConfig,
     channelName: String,
     threadNamePrefix: Option[String],
-    retryTimeoutMs: Long
-  ): BrokerToControllerChannelManager = {
+    retryTimeoutMs: Long,
+    networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   The fact that there are actually two separate timeouts here is a signal that something may be wrong.  To be clear, there were 2 separate timeouts prior to this PR -- this patch just makes their existence explicit so that we can set both of them in the KRaft case.   I've opened https://issues.apache.org/jira/browse/KAFKA-14394 to track this.



##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -211,7 +214,7 @@ class BrokerToControllerChannelManagerImpl(
         50,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
         Selectable.USE_DEFAULT_BUFFER_SIZE,
-        config.requestTimeoutMs,
+        networkClientRetryTimeoutMs.getOrElse(config.controllerSocketTimeoutMs),

Review Comment:
   Again, this is a cosmetic change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on PR #12856:
URL: https://github.com/apache/kafka/pull/12856#issuecomment-1317191981

   > change is getting a bit too big,
   
   Thanks, @cmccabe.  I've trimmed it down and opened the separate ticket as you mentioned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023315754


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -282,8 +307,9 @@ class BrokerToControllerRequestThread(
   config: KafkaConfig,
   time: Time,
   threadName: String,
-  retryTimeoutMs: Long
-) extends InterBrokerSendThread(threadName, networkClient, config.controllerSocketTimeoutMs, time, isInterruptible = false) {
+  networkClientRetryTimeoutMs: Int,
+  requestThreadRetryTimeoutMs: Long
+) extends InterBrokerSendThread(threadName, networkClient, networkClientRetryTimeoutMs, time, isInterruptible = false) {

Review Comment:
   Pass in `config.controllerSocketTimeoutMs` for the networkClientRetryTimeoutMs for there to be no change.  That is what we do for all ZK-based broker usage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] rondagostino commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
rondagostino commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023316404


##########
core/src/main/scala/kafka/server/KafkaServer.scala:
##########
@@ -283,7 +283,8 @@ class KafkaServer(
           config = config,
           channelName = "forwarding",
           threadNamePrefix = threadNamePrefix,
-          retryTimeoutMs = config.requestTimeoutMs.longValue
+          networkClientRetryTimeoutMs = config.controllerSocketTimeoutMs,

Review Comment:
   Pass `networkClientRetryTimeoutMs = config.controllerSocketTimeoutMs` for there to be no behavioral change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1023331930


##########
core/src/main/scala/kafka/server/AlterPartitionManager.scala:
##########
@@ -91,7 +91,8 @@ object AlterPartitionManager {
       config = config,
       channelName = "alterPartition",
       threadNamePrefix = threadNamePrefix,
-      retryTimeoutMs = Long.MaxValue
+      networkClientRetryTimeoutMs = if (config.processRoles.isEmpty) config.controllerSocketTimeoutMs else config.brokerSessionTimeoutMs / 2,

Review Comment:
   we don't want to change this, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619648


##########
core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala:
##########
@@ -159,7 +161,8 @@ class BrokerToControllerChannelManagerImpl(
   config: KafkaConfig,
   channelName: String,
   threadNamePrefix: Option[String],
-  retryTimeoutMs: Long
+  retryTimeoutMs: Long,
+  networkClientRetryTimeoutMs: Option[Int] = None

Review Comment:
   can you please call this `requestTimeoutMs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] cmccabe commented on a diff in pull request #12856: KAFKA-14392: KRaft broker heartbeat timeout should not exceed broker.session.timeout.ms

Posted by GitBox <gi...@apache.org>.
cmccabe commented on code in PR #12856:
URL: https://github.com/apache/kafka/pull/12856#discussion_r1028619470


##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -344,7 +345,8 @@ class BrokerServer(
         config,
         "heartbeat",
         threadNamePrefix,
-        config.brokerSessionTimeoutMs.toLong
+        config.brokerSessionTimeoutMs / 2,

Review Comment:
   this can stay at `brokerSessionTimeout`, not `brokerSessionTimeout / 2`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org