You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2019/05/29 16:36:00 UTC
[jira] [Created] (KUDU-2831)
DistributedDataGeneratorTest.testGenerateRandomData is flaky
Adar Dembo created KUDU-2831:
--------------------------------
Summary: DistributedDataGeneratorTest.testGenerateRandomData is flaky
Key: KUDU-2831
URL: https://issues.apache.org/jira/browse/KUDU-2831
Project: Kudu
Issue Type: Bug
Components: spark, test
Affects Versions: 1.10.0
Reporter: Adar Dembo
Saw this once last month and again today, so not super flaky but still worth fixing:
{noformat}
1) testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest)
java.lang.AssertionError: expected:<100> but was:<99>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58)
{noformat}
I talked about this with [~granthenke] when it last happened. The issue appears to be in the LongAccumulator used to track collisions in the data generator. Before the failure, the test logged this:
{noformat}
02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: 99
02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1
{noformat}
The assert code looks like this:
{noformat}
val collisions = ss.sparkContext.longAccumulator("row_collisions").value
// Collisions could cause the number of row to be less than the number set.
assertEquals(numRows - collisions, rdd.collect.length)
{noformat}
So the value of this LongAccumulator was zero even though there was one collision. Our thinking was that accumulators like these were updated asynchronously and so if we don't wait for the entire job to finish, we may not be getting their up-to-date values at assertion time.
We publish other LongAccumulators in kudu-spark, but AFAICT this is the only one that is asserted on. Nevertheless, it would be great if we could solve this in some generic way so that if someone wrote a test that used a different LongAccumulator, the race could be avoided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)