You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2019/11/19 18:29:00 UTC

[jira] [Commented] (LUCENE-9054) reproduceJenkinsFailures.py usage in the Lucene-Solr-repro jenkins job under reports number of failures

    [ https://issues.apache.org/jira/browse/LUCENE-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977691#comment-16977691 ] 

Chris M. Hostetter commented on LUCENE-9054:
--------------------------------------------

Background...

Yesterday, something caught my eye that made me question the jenkins reports i've been generating.

When skimming jenkins build failure emails, i happened to remember seeing a "Lucene-Solr-repro" email that mentioned some failures particularly in SpellCheckCollatorTest...

[https://builds.apache.org/job/Lucene-Solr-repro/3760/]
{noformat}
[repro] Failures:
[repro]   0/5 failed: org.apache.solr.cloud.CollectionsAPISolrJTest
[repro]   5/5 failed: org.apache.solr.cloud.MoveReplicaHDFSTest
[repro]   5/5 failed: org.apache.solr.spelling.SpellCheckCollatorTest
{noformat}
...this caught my eye, because while i was expecting the MoveReplicaHDFSTest failures (and had already AwaitsFixed that test in another jira) I didn't remember seeing any recent SpellCheckCollatorTest in my own aggregated jenkins reports recently: [http://fucit.org/solr-jenkins-reports/failure-report.html]

I thought maybe the seed being reproduced was more then a week ago (our builds, particularly the repro builds, can get fairly behind) and the results of _this_ (repro) build may not have been picked up by my aggregation crons yet.

But today, my reports still didn't list these failures. After investigating I realized the problem isn't in how my reports are fetching & aggregating the data from our jenkins jobs, but in how the {{reproduceJenkinsFailures.py}} script works in conjunction with the (default) way jenkins jobs collect the test-report XML files for each test....
----
{{reproduceJenkinsFailures.py}} will re-try to call the {{runTests}} function multiple times: (1) as originally run by the build being reproduced; (2) at the tip of the current branch; (3) at the tip of the branch w/o the original seed.

The problem is that each time the {{runTests}} function is called, junit outputs the results to the same {{./build/__MODULE__/test/TEST-__FQN_TEST_NAME__-__DUPS.xml}} file (where "DUPS" corresponds to the {{tests.dups=N}} test param, example...
{noformat}
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-4.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-5.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-3.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-2.xml
{noformat}
These 5 files will be (over)written a total of 3 times.

So once the {{reproduceJenkinsFailures.py}} script is completly done, the only test results included in the jenkins results, and the the only contributor to the "success/failure" of the jenkins job, is how the tests behaved on the tip of the branch, w/o the problematic seed.

The results from trying to reproduce the exact seed at the exact SHA, and trying to reproduce the exact seed on the tip of the branch are overwritten.
----
I think we should modify either the {{runTest}} or {{printReport}} functions in {{reproduceJenkinsFailures.py}} to _move_ all of the {{TEST-*.xml}} files produced by each run into a subdir (perhaps named after the style of reproduction tested: {{repro_raw}} , {{repro_branch_tip}} , {{repro_branch_tip_no_seed}} ) before continuing on to retry the test – and then ensure that the jenkin's jobs test reporter plugin is correctly configured to search for those junit output files in all subdirs (pretty sure it already is just because of how se use a build dir per module)

> reproduceJenkinsFailures.py usage in the Lucene-Solr-repro jenkins job under reports number of failures
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9054
>             Project: Lucene - Core
>          Issue Type: Test
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> Our {{reproduceJenkinsFailures.py}} script as used by the [https://builds.apache.org/job/Lucene-Solr-repro/] runs the tests multiple times, overwriting the same junit {{TEST-*.xml}} test result files each time, causing the jenkins job to under report how many times the various test(s) fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org