You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2019/11/12 20:38:00 UTC

[jira] [Updated] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

     [ https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris M. Hostetter updated SOLR-13864:
--------------------------------------
    Attachment: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt
        Status: Open  (was: Open)

[~jbernste] - it looks like you only fixed testGammaDistribution ?

what about testZipFDistribution, testGeometricDistribution, and testFuzzyKmeans ? ... as mentioned above they also seem to fail sporadically due to explicit assumptions about the underlying random distributions.

Attaching a recent jenkins failure from testZipFDistribution...

{noformat}
   [junit4]   2> 288619 INFO  (TEST-MathExpressionTest.testZipFDistribution-seed#[4B489A4C6D218B8D]) [     ] o.a.s.SolrTestCaseJ4 ###Ending testZipFDistribution
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MathExpressionTest -Dtests.method=testZipFDistribution -Dtests.seed=4B489A4C6D218B8D -Dtests.multiplier=2 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ja-JP -Dtests.timezone=America/St_Lucia -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.07s J1 | MathExpressionTest.testZipFDistribution <<<
   [junit4]    > Throwable #1: java.lang.Exception: Zipf distribution not descending!!!
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([4B489A4C6D218B8D:6FFDF7687A8983A5]:0)
   [junit4]    >        at org.apache.solr.client.solrj.io.stream.MathExpressionTest.testZipFDistribution(MathExpressionTest.java:3766)
{noformat}

> MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13864
>                 URL: https://issues.apache.org/jira/browse/SOLR-13864
>             Project: Solr
>          Issue Type: Test
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Joel Bernstein
>            Priority: Major
>         Attachments: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt
>
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various jenkins boxes going back quite a while ... mostly from testGammaDistribution, but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out of taking samples from different random distributions that aren't garunteed to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and expecting one to always be strictly greater then the other. I'm not a stats guy, but my naive understanding is that on the low end some of these shapes may cross over, so every possible random sample from one shape is not garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> ----
> Tests should not be making assertions about random data that aren't 100% garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < (4.99999D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for being controlled by the client/tests ... either via a request param used to initialize any new Random instances, or for example the use of the "tests.seed" property in various places in the code to try and provide some reproducibility even when the external solr client isn't even aware of randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org