You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2019/02/08 22:55:00 UTC

[jira] [Commented] (SOLR-13236) numerous problems with LIROnShardRestartTest

    [ https://issues.apache.org/jira/browse/SOLR-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763937#comment-16763937 ] 

Hoss Man commented on SOLR-13236:
---------------------------------

Examples of some of the types of failures i've observed in jenkins logs...

----


This error occurs inside of a catch block while trying to log some info about the state of hte election when the Error/Exception happened.  The original exception is completely lost in the logs because of this IllegalArgumentException, which arises from calling zkClient().getChildren() on the hardcoded string {{"/collections/allReplicasInLIR/leader_elect/shard1/election/"}} -- which as the error indicates is completley illegal, and indicates that this code path was never sanity checked when the test was written.

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=LIROnShardRestartTest -Dtests.method=testAllReplicasInLIR -Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt -Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR    144s J2 | LIROnShardRestartTest.testAllReplicasInLIR <<<
   [junit4]    > Throwable #1: java.lang.IllegalArgumentException: Path must not end with / character
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([10B31070AB4A4496:4A2B2AB6D5CA2371]:0)
   [junit4]    >        at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:58)
   [junit4]    >        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1523)
   [junit4]    >        at org.apache.solr.common.cloud.SolrZkClient.lambda$getChildren$4(SolrZkClient.java:346)
   [junit4]    >        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
   [junit4]    >        at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:346)
   [junit4]    >        at org.apache.solr.cloud.LIROnShardRestartTest.testAllReplicasInLIR(LIROnShardRestartTest.java:168)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
 {noformat}

This is a failure in the last line of the test, after all assertions ahve passed, to delete the collection -- i believe because the checks that " waiting for replicas rejoin election" doesn't first wait to see all the nodes disconnected from jetty and be marged "down" -- so the election may not have even happened yet by the time the test finishes, it may just be getting to the point where all the solr nodes are marked "down" when it tries to clean up...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=LIROnShardRestartTest -Dtests.method=testAllReplicasInLIR -Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt -Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   94.6s J1 | LIROnShardRestartTest.testAllReplicasInLIR <<<
   [junit4]    > Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([10B31070AB4A4496:4A2B2AB6D5CA2371]:0)
   [junit4]    >        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:461)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
   [junit4]    >        at org.apache.solr.cloud.LIROnShardRestartTest.testAllReplicasInLIR(LIROnShardRestartTest.java:175)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}


This is a (similar) failure in the first line of another test method to create the collection it wants to use, which can happen if the former test fails (or passes) and the next test method is started before all the nodes have a chance to re-connect to zk...

{noformat}
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=LIROnShardRestartTest -Dtests.method=testSeveralReplicasInLIR -Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt -Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.60s J1 | LIROnShardRestartTest.testSeveralReplicasInLIR <<<
   [junit4]    > Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([10B31070AB4A4496:96E987448B1009CF]:0)
   [junit4]    >        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:461)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
   [junit4]    >        at org.apache.solr.cloud.LIROnShardRestartTest.testSeveralReplicasInLIR(LIROnShardRestartTest.java:190)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}

Here is another error showing how the effects of one test method may not be adequately cleaned up by the time the next test method starts...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=LIROnShardRestartTest -Dtests.method=testSeveralReplicasInLIR -Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt -Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.54s J2 | LIROnShardRestartTest.testSeveralReplicasInLIR <<<
   [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:44441/solr: Cannot create collection severalReplicasInLIR. Value of maxShardsPerNode is 1, and the number of nodes currently live or live and part of your createNodeSet is 2. This allows a maximum of 2 to be created. Value of numShards is 1, value of nrtReplicas is 3, value of tlogReplicas is 0 and value of pullReplicas is 0. This requires 3 shards to be created (higher than the allowed number)
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([10B31070AB4A4496:96E987448B1009CF]:0)
   [junit4]    >        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
   [junit4]    >        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
   [junit4]    >        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
   [junit4]    >        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:484)
   [junit4]    >        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:414)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
   [junit4]    >        at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
   [junit4]    >        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
   [junit4]    >        at org.apache.solr.cloud.LIROnShardRestartTest.testSeveralReplicasInLIR(LIROnShardRestartTest.java:190)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}


> numerous problems with LIROnShardRestartTest
> --------------------------------------------
>
>                 Key: SOLR-13236
>                 URL: https://issues.apache.org/jira/browse/SOLR-13236
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> LIROnShardRestartTest is a frequent cause of jenkins failures -- but only on the 7x jenkins jobs, because it was removed from master/8x as part of SOLR-11812 since the underlying implementation being tested was deprecated and removed in 8x.
> I spent some time looking into trying to fix this test, but the amount of work it appears it would take to fix doesn't seem worth the effort given it's deprecated status.  so i'm filing this issue purely for tracking purposes with the plan to disable the test and resolve this jira as "Won't Fix" -- if anyone else is intereste in working on it they can feel free to re-open



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org