You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2017/08/18 19:22:00 UTC

[jira] [Created] (SOLR-11258) ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly: The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!

Hoss Man created SOLR-11258:
-------------------------------

             Summary: ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly:  The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!
                 Key: SOLR-11258
                 URL: https://issues.apache.org/jira/browse/SOLR-11258
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


Between June21 & Aug18, there have been 18 failures like this...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=ChaosMonkeySafeLeaderWithPullReplicasTest -Dtests.method=test -Dtests.seed=7669B63E9E4D1685 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=pa-Guru -Dtests.timezone=Europe/Podgorica -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 82.4s | ChaosMonkeySafeLeaderWithPullReplicasTest.test <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([7669B63E9E4D1685:FE3D89E430B17B7D]:0)
   [junit4]    >        at org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:587)
   [junit4]    >        at org.apache.solr.cloud.ChaosMonkeySafeLeaderWithPullReplicasTest.test(ChaosMonkeySafeLeaderWithPullReplicasTest.java:174)
   [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
   [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}

In my own testing, when these failures happen, the seeds reproduce - suggesting the problem is logic flaw in the test that can can happen by chance.

Perhaps the ChaosMonkey needs to be changed to get more aggressive about stopping nodes bsaed on how long it's been since hte last time it stopped a node?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org