You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2019/05/29 16:36:00 UTC

[jira] [Created] (KUDU-2831) DistributedDataGeneratorTest.testGenerateRandomData is flaky

Adar Dembo created KUDU-2831:
--------------------------------

             Summary: DistributedDataGeneratorTest.testGenerateRandomData is flaky
                 Key: KUDU-2831
                 URL: https://issues.apache.org/jira/browse/KUDU-2831
             Project: Kudu
          Issue Type: Bug
          Components: spark, test
    Affects Versions: 1.10.0
            Reporter: Adar Dembo


Saw this once last month and again today, so not super flaky but still worth fixing:
{noformat}
1) testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest)
java.lang.AssertionError: expected:<100> but was:<99>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.junit.Assert.assertEquals(Assert.java:631)
	at org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58)
{noformat}

I talked about this with [~granthenke] when it last happened. The issue appears to be in the LongAccumulator used to track collisions in the data generator. Before the failure, the test logged this:

{noformat}
02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: 99
02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1
{noformat}

The assert code looks like this:
{noformat}
    val collisions = ss.sparkContext.longAccumulator("row_collisions").value
    // Collisions could cause the number of row to be less than the number set.
    assertEquals(numRows - collisions, rdd.collect.length)
{noformat}

So the value of this LongAccumulator was zero even though there was one collision. Our thinking was that accumulators like these were updated asynchronously and so if we don't wait for the entire job to finish, we may not be getting their up-to-date values at assertion time.

We publish other LongAccumulators in kudu-spark, but AFAICT this is the only one that is asserted on. Nevertheless, it would be great if we could solve this in some generic way so that if someone wrote a test that used a different LongAccumulator, the race could be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)