You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/06/04 21:26:51 UTC

[GitHub] [kafka] rondagostino opened a new pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

rondagostino opened a new pull request #10823:
URL: https://github.com/apache/kafka/pull/10823


   https://github.com/apache/kafka/pull/10494 introduced a bug in the KRaft controller where the controller will loop forever in `StripedReplicaPlacer` trying to identify the racks on which to place partition replicas if there is a single unfenced broker in the cluster and the number of requested partitions in a CREATE_TOPICS request is greater than 1.
   
   This patch refactors out some argument sanity checks and invokes those checks in both `RackList` and StripedReplicaPlacer`, and it adds tests for this as well as the single broker placement issue.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646148894



##########
File path: metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##########
@@ -340,14 +340,14 @@ int numUnfencedBrokers() {
         }
 
         List<Integer> place(int replicationFactor) {
-            if (replicationFactor <= 0) {
-                throw new InvalidReplicationFactorException("Invalid replication factor " +
-                        replicationFactor + ": the replication factor must be positive.");
-            }
+            throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+            throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, numTotalBrokers());
+            throwInvalidReplicationFactorIfZero(numUnfencedBrokers());
             // If we have returned as many assignments as there are unfenced brokers in
             // the cluster, shuffle the rack list and broker lists to try to avoid
             // repeating the same assignments again.
-            if (epoch == numUnfencedBrokers) {
+            // But don't reset the iteration epoch for a single unfenced broker -- otherwise we would loop forever
+            if (epoch == numUnfencedBrokers && numUnfencedBrokers > 1) {

Review comment:
       There was no test covering this case, but I added one: `testMultiPartitionTopicPlacementOnSingleUnfencedBroker()` will never finish without this fix.  The `while (true)` loop in `RackList.place()` will never exit without this change when placing multiple partitions on a cluster with just a single unfenced broker.  The issue is that the iteration epoch will start at 0 for the first partition but (without the change) will be reset back to 0 for the second partition; the `Rack` instance associated with the broker will see the same iteration epoch for the second partition and therefore says it has no more unfenced brokers available.  The loop moves to the next rack, but there is no next rack -- there's only the one -- so around we go again asking the same question, ad infinitum.
   
   One might wonder about the validity of resetting the iteration epoch backwards to zero at all -- if it is possible that a rack with a single broker could see some iteration epoch and then be asked to place another partition just at the moment when the epoch loops back to the same value.  I think this is not possible because the racks are shuffled once every broker gets an assignment (and hence every rack gets at least one assignment); no rack will see the same iteration epoch again without it seeing a different iteration epoch in between.
   
   The degenerate case of just 1 broker is the one we are fixing here: we can't reset the epoch because shuffling has no effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] junrao merged pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
junrao merged pull request #10823:
URL: https://github.com/apache/kafka/pull/10823


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino commented on pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino commented on pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#issuecomment-856248774


   For example, here's one of the 20 failures:
   
   ```
   [2021-06-05T02:12:18.820Z] AuthorizerIntegrationTest > testDeleteGroupApiWithDeleteGroupAcl() STARTED
   [2021-06-05T02:12:18.820Z] kafka.api.SslAdminIntegrationTest.testAclUpdatesUsingSynchronousAuthorizer() failed, log available in /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-10823/core/build/reports/testOutput/kafka.api.SslAdminIntegrationTest.testAclUpdatesUsingSynchronousAuthorizer().test.stdout
   [2021-06-05T02:12:18.820Z] 
   [2021-06-05T02:12:18.820Z] SslAdminIntegrationTest > testAclUpdatesUsingSynchronousAuthorizer() FAILED
   [2021-06-05T02:12:18.820Z]     org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /kafka-acl
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646148894



##########
File path: metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##########
@@ -340,14 +340,14 @@ int numUnfencedBrokers() {
         }
 
         List<Integer> place(int replicationFactor) {
-            if (replicationFactor <= 0) {
-                throw new InvalidReplicationFactorException("Invalid replication factor " +
-                        replicationFactor + ": the replication factor must be positive.");
-            }
+            throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+            throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, numTotalBrokers());
+            throwInvalidReplicationFactorIfZero(numUnfencedBrokers());
             // If we have returned as many assignments as there are unfenced brokers in
             // the cluster, shuffle the rack list and broker lists to try to avoid
             // repeating the same assignments again.
-            if (epoch == numUnfencedBrokers) {
+            // But don't reset the iteration epoch for a single unfenced broker -- otherwise we would loop forever
+            if (epoch == numUnfencedBrokers && numUnfencedBrokers > 1) {

Review comment:
       There was no test covering this case, but I added one: `testMultiPartitionTopicPlacementOnSingleUnfencedBroker()` will never finish without this fix.  The `while (true)` loop in `RackList.place()` will never exit without this change when placing multiple partitions on a cluster with just a single unfenced broker.  The issue is that the iteration epoch will start at 0 for the first partition but (without the change) will be reset back to 0 for the second partition; the `Rack` instance associated with the broker will see the same iteration epoch for the second partition and therefore says it has no more unfenced brokers available.  The loop moves to the next rack, but there is no next rack -- there's only the one -- so we around we go again asking the same question, ad infinitum.
   
   One might wonder about the validity of resetting the iteration epoch backwards to zero at all -- if it is possible that a rack with a single broker could see some iteration epoch and then be asked to place another partition just at the moment when the epoch loops back to the same value.  I think this is not possible because the racks are shuffled once every broker gets an assignment (and hence every rack gets at least one assignment); no rack will see the same iteration epoch again without it seeing a different iteration epoch in between.
   
   The degenerate case of just 1 broker is the one we are fixing here: we can't reset the epoch because shuffling has no effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino edited a comment on pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino edited a comment on pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#issuecomment-856248033


   > JDK 11 test seems to have failed with 20 failures. Are they related to this PR?
   
   Thanks for the review, @junrao.  Those failures are unrelated -- they are due to https://issues.apache.org/jira/browse/KAFKA-12892.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] junrao commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
junrao commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r645905158



##########
File path: metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##########
@@ -412,14 +433,9 @@ public StripedReplicaPlacer(Random random) {
                                      short replicationFactor,
                                      Iterator<UsableBroker> iterator) {
         RackList rackList = new RackList(random, iterator);
-        if (rackList.numUnfencedBrokers() == 0) {
-            throw new InvalidReplicationFactorException("All brokers are currently fenced.");
-        }
-        if (replicationFactor > rackList.numTotalBrokers()) {
-            throw new InvalidReplicationFactorException("The target replication factor " +
-                "of " + replicationFactor + " cannot be reached because only " +
-                rackList.numTotalBrokers() + " broker(s) are registered.");
-        }
+        throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+        throwInvalidReplicationFactorIfZero(rackList.numUnfencedBrokers());
+        throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, rackList.numTotalBrokers());

Review comment:
       It seems that we are doing the same tests in rackList.place(). Should we just do the tests in one place?

##########
File path: metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##########
@@ -340,14 +340,14 @@ int numUnfencedBrokers() {
         }
 
         List<Integer> place(int replicationFactor) {
-            if (replicationFactor <= 0) {
-                throw new InvalidReplicationFactorException("Invalid replication factor " +
-                        replicationFactor + ": the replication factor must be positive.");
-            }
+            throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+            throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, numTotalBrokers());
+            throwInvalidReplicationFactorIfZero(numUnfencedBrokers());
             // If we have returned as many assignments as there are unfenced brokers in
             // the cluster, shuffle the rack list and broker lists to try to avoid
             // repeating the same assignments again.
-            if (epoch == numUnfencedBrokers) {
+            // But don't reset the iteration epoch for a single unfenced broker -- otherwise we would loop forever
+            if (epoch == numUnfencedBrokers && numUnfencedBrokers > 1) {

Review comment:
       Which loop loops forever because of this? Also, is there an existing test covering this? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino commented on pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino commented on pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#issuecomment-856248033


   < JDK 11 test seems to have failed with 20 failures. Are they related to this PR?
   
   Thanks for the review, @junrao.  Those failures are unrelated -- they are due to https://issues.apache.org/jira/browse/KAFKA-12892.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

Posted by GitBox <gi...@apache.org>.
rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646142461



##########
File path: metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##########
@@ -412,14 +433,9 @@ public StripedReplicaPlacer(Random random) {
                                      short replicationFactor,
                                      Iterator<UsableBroker> iterator) {
         RackList rackList = new RackList(random, iterator);
-        if (rackList.numUnfencedBrokers() == 0) {
-            throw new InvalidReplicationFactorException("All brokers are currently fenced.");
-        }
-        if (replicationFactor > rackList.numTotalBrokers()) {
-            throw new InvalidReplicationFactorException("The target replication factor " +
-                "of " + replicationFactor + " cannot be reached because only " +
-                rackList.numTotalBrokers() + " broker(s) are registered.");
-        }
+        throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+        throwInvalidReplicationFactorIfZero(rackList.numUnfencedBrokers());
+        throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, rackList.numTotalBrokers());

Review comment:
       We have three separate argument checks to perform, each with its separate error message that the test check for:
   
   a) replication factory can't be non-positive
   b) unfenced broker count can't be 0
   c) unfenced broker count must must not be less than replication factor
   
   `StripedReplicaPlacer.place()` was performing checks (b) and (c).
   `RackList.place()` was performing check (a)
   
   Given that `RackList` is publicly accessible within the package, I felt it was important to perform all the sanity checks there. But `StripedReplicaPlacer.place()` also has a loop and allocates an array, so while we could allow the first iteration of the loop to raise the exception, I felt it was clearer to perform the checks there as well.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org