You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/03/02 03:13:16 UTC

[GitHub] [kafka] abbccdda opened a new pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

abbccdda opened a new pull request #10240:
URL: https://github.com/apache/kafka/pull/10240


   We introduced a regression in https://github.com/apache/kafka/pull/9579 where originally we only return `INVALID_REPLICATION_FACTOR` for internal topic creation when there are not enough brokers. The mentioned PR expanded the scope and return this error to non-internal topic creation cases as well, which should be `LEADER_NOT_AVAILABLE` instead.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#discussion_r588513624



##########
File path: core/src/test/scala/unit/kafka/server/AutoTopicCreationManagerTest.scala
##########
@@ -146,29 +146,118 @@ class AutoTopicCreationManagerTest {
   }
 
   @Test
-  def testNotEnoughLiveBrokers(): Unit = {
-    val props = TestUtils.createBrokerConfig(1, "localhost")
-    props.setProperty(KafkaConfig.DefaultReplicationFactorProp, 3.toString)
-    config = KafkaConfig.fromProps(props)
+  def testInvalidReplicationFactorForNonInternalTopics(): Unit = {
+    testErrorWithCreationInZk(Errors.INVALID_REPLICATION_FACTOR, "topic", isInternal = false)
+  }
+
+  @Test
+  def testInvalidReplicationFactorForConsumerOffsetsTopic(): Unit = {
+    Mockito.when(groupCoordinator.offsetsTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.INVALID_REPLICATION_FACTOR, Topic.GROUP_METADATA_TOPIC_NAME, isInternal = true)
+  }
+
+  @Test
+  def testInvalidReplicationFactorForTxnOffsetTopic(): Unit = {
+    Mockito.when(transactionCoordinator.transactionTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.INVALID_REPLICATION_FACTOR, Topic.TRANSACTION_STATE_TOPIC_NAME, isInternal = true)
+  }
+
+  @Test
+  def testTopicExistsErrorSwapForNonInternalTopics(): Unit = {
+    testErrorWithCreationInZk(Errors.TOPIC_ALREADY_EXISTS, "topic", isInternal = false,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testTopicExistsErrorSwapForConsumerOffsetsTopic(): Unit = {
+    Mockito.when(groupCoordinator.offsetsTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.TOPIC_ALREADY_EXISTS, Topic.GROUP_METADATA_TOPIC_NAME, isInternal = true,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testTopicExistsErrorSwapForTxnOffsetTopic(): Unit = {
+    Mockito.when(transactionCoordinator.transactionTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.TOPIC_ALREADY_EXISTS, Topic.TRANSACTION_STATE_TOPIC_NAME, isInternal = true,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testRequestTimeoutErrorSwapForNonInternalTopics(): Unit = {
+    testErrorWithCreationInZk(Errors.REQUEST_TIMED_OUT, "topic", isInternal = false,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testRequestTimeoutErrorSwapForConsumerOffsetTopic(): Unit = {
+    Mockito.when(groupCoordinator.offsetsTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.REQUEST_TIMED_OUT, Topic.GROUP_METADATA_TOPIC_NAME, isInternal = true,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testRequestTimeoutErrorSwapForTxnOffsetTopic(): Unit = {
+    Mockito.when(transactionCoordinator.transactionTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.REQUEST_TIMED_OUT, Topic.TRANSACTION_STATE_TOPIC_NAME, isInternal = true,
+      expectedError = Errors.LEADER_NOT_AVAILABLE)
+  }
+
+  @Test
+  def testUnknownTopicPartitionForNonIntervalTopic(): Unit = {
+    testErrorWithCreationInZk(Errors.UNKNOWN_TOPIC_OR_PARTITION, "topic", isInternal = false)
+  }
 
+  @Test
+  def testUnknownTopicPartitionForConsumerOffsetTopic(): Unit = {
+    Mockito.when(groupCoordinator.offsetsTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.UNKNOWN_TOPIC_OR_PARTITION, Topic.GROUP_METADATA_TOPIC_NAME, isInternal = true)
+  }
+
+  @Test
+  def testUnknownTopicPartitionForTxnOffsetTopic(): Unit = {
+    Mockito.when(transactionCoordinator.transactionTopicConfigs).thenReturn(new Properties)
+    testErrorWithCreationInZk(Errors.UNKNOWN_TOPIC_OR_PARTITION, Topic.TRANSACTION_STATE_TOPIC_NAME, isInternal = true)
+  }
+
+  private def testErrorWithCreationInZk(error: Errors,
+                                        topicName: String,
+                                        isInternal: Boolean,
+                                        expectedError: Errors = null): Unit = {

Review comment:
       nit: a little more idiomatic to use `Option[Errors]` for a case like this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#discussion_r585651983



##########
File path: core/src/main/scala/kafka/server/AutoTopicCreationManager.scala
##########
@@ -275,7 +275,11 @@ class DefaultAutoTopicCreationManager(
       val validationError: Option[Errors] = if (!isValidTopicName(topic)) {
         Some(Errors.INVALID_TOPIC_EXCEPTION)
       } else if (!hasEnoughLiveBrokers(topic, aliveBrokers)) {
-        Some(Errors.INVALID_REPLICATION_FACTOR)
+        if (Topic.isInternal(topic)) {
+          Some(Errors.INVALID_REPLICATION_FACTOR)

Review comment:
       I cannot come up with a good reason why we should preserve this inconsistent handling. I traced back the origin of it to here: https://github.com/apache/kafka/commit/063d534c5160316cdf22e476d128e872a1412783. It looks to me like the intent is for the client to keep retrying until the topic can be created, which makes sense since we return `COORDINATOR_NOT_AVAILABLE` in response to `FindCoordinator` requests for the same case. However, `INVALID_REPLICATION_FACTOR` is a non-retriable error, so it's likely this was poorly understood at the time. I suggest that we return `LEADER_NOT_AVAILABLE` consistently.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-789380164


   After offline sync with @hachikuji , we decided that the invalid replication factor check would be redundant to be performed on the forwarding broker. Will remove that logic to ensure we don't accidentally return any wrong error code to the client, due to the staleness of metadata cache.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda merged pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda merged pull request #10240:
URL: https://github.com/apache/kafka/pull/10240


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda edited a comment on pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda edited a comment on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-789965233


   New system test run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4415/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on a change in pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on a change in pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#discussion_r588010801



##########
File path: tests/kafkatest/tests/core/security_test.py
##########
@@ -65,7 +65,7 @@ def test_client_ssl_endpoint_validation_failure(self, security_protocol, interbr
         Test that invalid hostname in certificate results in connection failures.
         When security_protocol=SSL, client SSL handshakes are expected to fail due to hostname verification failure.
         When security_protocol=PLAINTEXT and interbroker_security_protocol=SSL, controller connections fail
-        with hostname verification failure. Hence clients are expected to fail with LEADER_NOT_AVAILABLE.
+        with hostname verification failure. Hence clients are expected to fail with INVALID_REPLICATION_FACTOR.

Review comment:
       Sounds good, will add this part!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-788992859


   @abbccdda Hmm.. Some of the failing tests suggest that `INVALID_REPLICATION_FACTOR` was intentional. For example, `MetadataRequestTest.testAutoCreateTopicWithInvalidReplicationFactor`. There must be some other difference in the new logic...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-788548109


   Triggered system tests:
   trunk: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4408/
   fix: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4409/
   to see if we have the verifiable producer test fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#discussion_r587955277



##########
File path: tests/kafkatest/tests/core/security_test.py
##########
@@ -65,7 +65,7 @@ def test_client_ssl_endpoint_validation_failure(self, security_protocol, interbr
         Test that invalid hostname in certificate results in connection failures.
         When security_protocol=SSL, client SSL handshakes are expected to fail due to hostname verification failure.
         When security_protocol=PLAINTEXT and interbroker_security_protocol=SSL, controller connections fail
-        with hostname verification failure. Hence clients are expected to fail with LEADER_NOT_AVAILABLE.
+        with hostname verification failure. Hence clients are expected to fail with INVALID_REPLICATION_FACTOR.

Review comment:
       I still think this error is not a very intuitive way to handle the absence of metadata. Maybe we can rephrase the explanation a little bit.
   > Since metadata cannot be propagated in the cluster without a valid certificate, the broker's metadata caches will be empty. Hence we expect `Metadata` requests to fail with an `INVALID_REPLICATION_FACTOR` error since the broker will attempt to create the topic automatically as it does not exist in the metadata cache, and there will be no online brokers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-789965233


   New system test run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4414/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-791798044


   System test pass: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4419/ and only flaky tests are failing, merging


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #10240: KAFKA-12381: only return leader not available for internal topic creation

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-789032940


   Ok, I think I see what is going on now. The failing system test is verifying what happens when inter-broker communication no longer works. This results in different behavior because `AutoTopicCreationManager` relies on the `MetadataCache` in order to determine the number of live brokers while the old logic checked zk directly. That makes the `INVALID_REPLICATION_FACTOR` more dangerous since it is not retriable and the cache may be stale. In particular, when inter-broker communication is down, the cache will be empty and the broker will end up trying to auto-create all topics.
   
   I can think of a few options to address the problem:
   
   1. Bring back the old logic to check Zookeeper for the live brokers. This might be fine for 2.8, but it does not address the problem for KIP-500.
   2. Return a retriable error instead. Really `UNKNOWN_TOPIC_OR_PARTITION` would be a better error in this case.
   3. Make `INVALID_REPLICATION_FACTOR` a retriable error. I guess we have to understand how clients 
   
   My inclination is probably option 2. The downside is that the user would no longer get a clear error when a topic cannot be auto-created. But I feel overall it's the safest and most consistent way to handle this case. There might be other options though.
   
   It's interesting to note that this relates back to some of the discussion in the auto-create PR itself. We had discussed skipping the replication factor check on the broker and sending the request to the controller. But either way, we have to rely on the metadata cache locally at least to determine whether the topic already exists or not, so it might not have really helped.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on pull request #10240: KAFKA-12381: remove live broker checks for forwarding topic creation

Posted by GitBox <gi...@apache.org>.
abbccdda commented on pull request #10240:
URL: https://github.com/apache/kafka/pull/10240#issuecomment-791799082


   Cherry-picked to 2.8


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org