You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Erick Erickson (Jira)" <ji...@apache.org> on 2019/12/21 16:46:00 UTC

[jira] [Commented] (SOLR-13709) Race condition on core reload while core is still loading?

    [ https://issues.apache.org/jira/browse/SOLR-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001746#comment-17001746 ] 

Erick Erickson commented on SOLR-13709:
---------------------------------------

[~hossman] any thoughts?

I've finally gotten back to this and think I'm getting close. The short form is that I think the test calls delete collection before prior operations are complete.

During a successful run, I see:
 * a bunch of reloads
 * a bunch of unloads

- The reloads seem OK, they're being run from {code}SolrCore.getConfListener{code}. 

- The unload is coming when the collection is being deleted near the end of the test.

AFAIK, the only way the CoreDescriptor could be disappearing form the various lists is through unload and that's only being called by deleteCollection in this test.

So what my next best guess is is that the reloads aren't complete by the time the test gets to the delete collection call and that's where the race condition is coming from.

I'm putting some more debugging, I want to see all thread dumps when the NPE occurs. I'm wondering if the sequencing in from the Overseer is part of the problem, but that's a guess.

> Race condition on core reload while core is still loading?
> ----------------------------------------------------------
>
>                 Key: SOLR-13709
>                 URL: https://issues.apache.org/jira/browse/SOLR-13709
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: apache_Lucene-Solr-Tests-8.x_449.log.txt
>
>
> A recent jenkins failure from {{TestSolrCLIRunExample}} seems to suggest that there may be a race condition when attempting to re-load a SolrCore while the core is currently in the process of (re)loading that can leave the SolrCore in an unusable state.
> Details to follow...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org