You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Chia-Ping Tsai (Jira)" <ji...@apache.org> on 2020/03/25 10:11:00 UTC

[jira] [Comment Edited] (KAFKA-9750) Flaky test kafka.server.ReplicaManagerTest.testFencedErrorCausedByBecomeLeader

    [ https://issues.apache.org/jira/browse/KAFKA-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066574#comment-17066574 ] 

Chia-Ping Tsai edited comment on KAFKA-9750 at 3/25/20, 10:10 AM:
------------------------------------------------------------------

{code:scala}
      // change the epoch from 0 to 1 in order to make fenced error
      replicaManager.becomeLeaderOrFollower(0, leaderAndIsrRequest(1), (_, _) => ())
      TestUtils.waitUntilTrue(() => replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.values.forall(_.partitionCount() == 0),
        s"the partition=$topicPartition should be removed from pending state")
{code}

The root cause is race condition. The partition is add to the end instead of being removed if the epoch in ReplicaAlterLogDirsThread is increased. This PR includes following changes.
1. controls the lock of ReplicaAlterLogDirsThread to make the fenced error happen almost.
2. wait for the completion of thread


was (Author: chia7712):
{code:java}
      // change the epoch from 0 to 1 in order to make fenced error
      replicaManager.becomeLeaderOrFollower(0, leaderAndIsrRequest(1), (_, _) => ())
      TestUtils.waitUntilTrue(() => replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.values.forall(_.partitionCount() == 0),
        s"the partition=$topicPartition should be removed from pending state")
{code}

The root cause is race condition. The partition is add to the end instead of being removed if the epoch in ReplicaAlterLogDirsThread is increased. This PR includes following changes.
1. controls the lock of ReplicaAlterLogDirsThread to make the fenced error happen almost.
2. wait for the completion of thread

> Flaky test kafka.server.ReplicaManagerTest.testFencedErrorCausedByBecomeLeader
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-9750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9750
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Bob Barrett
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>              Labels: flaky-test
>
> When running tests locally, I've seen that 1-2% of the time, testFencedErrorCausedByBecomeLeader fails with
> {code:java}
> org.scalatest.exceptions.TestFailedException: the partition=test-topic-0 should be removed from pending stateorg.scalatest.exceptions.TestFailedException: the partition=test-topic-0 should be removed from pending state
>  at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) at org.scalatest.Assertions.fail(Assertions.scala:1091) at org.scalatest.Assertions.fail$(Assertions.scala:1087) at org.scalatest.Assertions$.fail(Assertions.scala:1389) at kafka.server.ReplicaManagerTest.testFencedErrorCausedByBecomeLeader(ReplicaManagerTest.scala:248) at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:40) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)