You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/01/11 15:10:51 UTC

[GitHub] [kafka] dajac opened a new pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

dajac opened a new pull request #11665:
URL: https://github.com/apache/kafka/pull/11665


   `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds` fails regularly because `assertFetcherHasTopicId` can't validate the topic id in the fetch state. The issue is quite subtile. Under the hood, `assertFetcherHasTopicId` acquires the `partitionMapLock` in the fetcher thread. `partitionMapLock` is also acquired by the the `processFetchRequest` method. If `processFetchRequest` acquires it before `assertFetcherHasTopicId` can check the fetch state, `assertFetcherHasTopicId` has not chance to verify the state anymore because `processFetchRequest` will remove the fetch state before releasing the lock.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac commented on a change in pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
dajac commented on a change in pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#discussion_r788510764



##########
File path: core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala
##########
@@ -3569,12 +3576,13 @@ class ReplicaManagerTest {
       assertEquals(0, replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.size)
       replicaManager.alterReplicaLogDirs(Map(topicPartition -> newReplicaFolder.getAbsolutePath))
 
-      assertFetcherHasTopicId(replicaManager.replicaAlterLogDirsManager, partition.topicPartition, topicIdOpt)
-
       // Make sure the future log is created.
       replicaManager.futureLocalLogOrException(topicPartition)
       assertEquals(1, replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.size)
 
+      // Verify that the fetcher has the correct topic id.
+      assertFetcherHasTopicId(replicaManager.replicaAlterLogDirsManager, partition.topicPartition, topicIdOpt)

Review comment:
       That's right. The goal is to verify that the topicid is correctly propagated to the fetcher state. Now that I think about it a bit more, we might be able to achieve the same by mocking the `ReplicaAlterLogDirsManager`. Let me give it a try.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac commented on pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
dajac commented on pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#issuecomment-1010214440


   > Is this test useful if we can't verify the topic ID? I think we have a similar one that doesn't check topic ID?
   
   Do we? I haven't found it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] jolshan commented on pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
jolshan commented on pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#issuecomment-1010212921


   Is this test useful if we can't verify the topic ID? I think we have a similar one that doesn't check topic ID?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] jolshan commented on pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
jolshan commented on pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#issuecomment-1010224534


   Strange -- I might be misremembering because I can't seem to find it either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac merged pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
dajac merged pull request #11665:
URL: https://github.com/apache/kafka/pull/11665


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac commented on a change in pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
dajac commented on a change in pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#discussion_r788499923



##########
File path: core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala
##########
@@ -3541,7 +3541,14 @@ class ReplicaManagerTest {
     val version = if (usesTopicIds) LeaderAndIsrRequestData.HIGHEST_SUPPORTED_VERSION else 4.toShort
     val topicId = if (usesTopicIds) this.topicId else Uuid.ZERO_UUID
     val topicIdOpt = if (usesTopicIds) Some(topicId) else None
-    val replicaManager = setupReplicaManagerWithMockedPurgatories(new MockTimer(time))
+
+    // We use a low `replica.fetch.max.bytes` in order to have multiple fetch
+    // requests issued by the ReplicaAlterLogDirsThread.
+    val replicaManager = setupReplicaManagerWithMockedPurgatories(
+      new MockTimer(time),
+      propsModifier = props => props.put(KafkaConfig.ReplicaFetchMaxBytesProp, "1")

Review comment:
       The issue is quite subtile. The intent here is to ensures that the fetcher uses multiple fetch requests to replicate the data to the new log. The issue is that both `assertFetcherHasTopicId` and `processFetchRequest` compete for the `partitionMapLock` lock. If `assertFetcherHasTopicId` gets it first, the test pass. If `processFetchRequest` gets it first, the test fail because the fetch state is removed while holding the lock. By using multiple fetch requests, `assertFetcherHasTopicId` has a chance to verify the state before the copy is completed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac commented on pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
dajac commented on pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#issuecomment-1014389443


   @jolshan I have found another way to fix the issue and that one keeps asserting the topic ID. Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #11665: KAFKA-13585; Fix flaky test `ReplicaManagerTest.testReplicaAlterLogDirsWithAndWithoutIds`

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #11665:
URL: https://github.com/apache/kafka/pull/11665#discussion_r788276315



##########
File path: core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala
##########
@@ -3541,7 +3541,14 @@ class ReplicaManagerTest {
     val version = if (usesTopicIds) LeaderAndIsrRequestData.HIGHEST_SUPPORTED_VERSION else 4.toShort
     val topicId = if (usesTopicIds) this.topicId else Uuid.ZERO_UUID
     val topicIdOpt = if (usesTopicIds) Some(topicId) else None
-    val replicaManager = setupReplicaManagerWithMockedPurgatories(new MockTimer(time))
+
+    // We use a low `replica.fetch.max.bytes` in order to have multiple fetch
+    // requests issued by the ReplicaAlterLogDirsThread.
+    val replicaManager = setupReplicaManagerWithMockedPurgatories(
+      new MockTimer(time),
+      propsModifier = props => props.put(KafkaConfig.ReplicaFetchMaxBytesProp, "1")

Review comment:
       I am not sure I understand the issue. Was the problem that the copy was completing before the failing assertion? Is this intended to slow down the fetcher?

##########
File path: core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala
##########
@@ -3569,12 +3576,13 @@ class ReplicaManagerTest {
       assertEquals(0, replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.size)
       replicaManager.alterReplicaLogDirs(Map(topicPartition -> newReplicaFolder.getAbsolutePath))
 
-      assertFetcherHasTopicId(replicaManager.replicaAlterLogDirsManager, partition.topicPartition, topicIdOpt)
-
       // Make sure the future log is created.
       replicaManager.futureLocalLogOrException(topicPartition)
       assertEquals(1, replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.size)
 
+      // Verify that the fetcher has the correct topic id.
+      assertFetcherHasTopicId(replicaManager.replicaAlterLogDirsManager, partition.topicPartition, topicIdOpt)

Review comment:
       It's a slightly strange validation at this level. I guess we're verifying the propagation of the topicId to the log dir fetcher? It seems like the main thing we'd want to verify is that the new log gets created with the right topicId. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org