You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Ahmed Abdul Hamid (JIRA)" <ji...@apache.org> on 2018/04/26 16:25:00 UTC

[jira] [Commented] (SAMZA-1476) Flaky test: TestStatefulTask testShouldStartAndRestore

    [ https://issues.apache.org/jira/browse/SAMZA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454480#comment-16454480 ] 

Ahmed Abdul Hamid commented on SAMZA-1476:
------------------------------------------

I tried to reproduce the problem locally but had no success. The test still succeeded even when executed from within a CPU/memory constrained Docker container with a JVM set to confine its memory usage to -Xms 256m -Xmm 512m.

 

Based solely on code analysis, the only suspicious thing I could observe was a CountDownLatch variable, gotMessage, that is awaited-then-reinitialized by the test's main thread in TestTask.awaitMessage – where the failing assert is located – and countDown'ed by a different thread in TestTask.process.

 

Intuitively, gotMessage should be marked volatile to ensure that different threads see a consistent/correct value of the reference. However, another test, TestShutdownStatefulTask, shares the code in TestTask.process with TestStatefulTask via their common parent TestTask, and exercises almost identical behavior but has not been previously reported as flaky. It could be that TestShutdownStatefulTask failed before but was not reported or it could be a different issue, although the similarities and shared code between TestStatefulTask and TestShutdownStatefulTask lead me to believe the latter must also be flaky.

 

> Flaky test: TestStatefulTask testShouldStartAndRestore   
> ---------------------------------------------------------
>
>                 Key: SAMZA-1476
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1476
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jagadish
>            Assignee: Ahmed Abdul Hamid
>            Priority: Major
>             Fix For: 0.15.0
>
>
> {code}  
> java.lang.AssertionError: Timed out of waiting for message rather than received one.         at org.junit.Assert.fail(Assert.java:91)         at org.junit.Assert.assertTrue(Assert.java:43)         at org.apache.samza.test.integration.TestTask.awaitMessage(StreamTaskTestUtil.scala:331)         at org.apache.samza.test.integration.StreamTaskTestUtil.send(StreamTaskTestUtil.scala:235)         at org.apache.samza.test.integration.TestStatefulTask$$anonfun$testShouldStartTaskForFirstTime$1.apply(TestStatefulTask.scala:97)         at org.apache.samza.test.integration.TestStatefulTask$$anonfun$testShouldStartTaskForFirstTime$1.apply(TestStatefulTask.scala:97)         at scala.collection.immutable.List.foreach(List.scala:381)         at org.apache.samza.test.integration.TestStatefulTask.testShouldStartTaskForFirstTime(TestStatefulTask.scala:97)         at org.apache.samza.test.integration.TestStatefulTask.testShouldStartAndRestore
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)