You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2016/11/27 05:43:58 UTC

[jira] [Resolved] (SOLR-7936) Bogus failure when deleting collections.

     [ https://issues.apache.org/jira/browse/SOLR-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson resolved SOLR-7936.
----------------------------------
    Resolution: Cannot Reproduce

Can't get this to fail now.

> Bogus failure when deleting collections.
> ----------------------------------------
>
>                 Key: SOLR-7936
>                 URL: https://issues.apache.org/jira/browse/SOLR-7936
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> When looking at the CDCR test failures, we began to wonder whether the problem was
> 1> the cdcr code itself
> 2> the test framework
> 3> Solr
> Some of the failures seem to be "impossible" assuming collection creation/deletion work OK.
> So I wrote a little program to exercise collection creation/deletion outside the test framework by just adding and deleting the same collection over and over and over again, and it started regularly failing in OverseerCollectionMessageHandler.deleteCollection about line 780 it would throw the "Could not fully remove the collection" exception:
> {code}
>       TimeOut timeout = new TimeOut(30, TimeUnit.SECONDS);
>       boolean removed = false;
>       while (! timeout.hasTimedOut()) {
>         Thread.sleep(100);
>         // WORKS SO FAR IF UNCOMMENTED zkStateReader.updateClusterState();
>         removed = !zkStateReader.getClusterState().hasCollection(collection);
>         if (removed) {
>           Thread.sleep(500); // just a bit of time so it's more likely other
>                              // readers see on return
>           break;
>         }
>       }
>       if (!removed) {
>         throw new SolrException(ErrorCode.SERVER_ERROR,
>             "Could not fully remove collection: " + collection);
>       }
> {code}
> However, the collection is really gone from clusterstate. When I put the updateClusterState() in above, it doesn't seem to fail. Is it as simple as the updateClusterState() call?
> Without the update in place, it failed within 20 reps very regularly. So far, with the update in place we're at 132 and counting. Any comments?
> If this runs 1,000 times tonight, I'll check it in if there are no objections. I don't know what it means for CDCR yet though.
> I'm also suspicious of the 500ms sleep. Anyone have a clue what that's in there for?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org