You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2019/04/15 16:54:00 UTC

[jira] [Commented] (SOLR-11035) (at least) 2 distinct failures possible when clients attempt searches during SolrCore reload

    [ https://issues.apache.org/jira/browse/SOLR-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818171#comment-16818171 ] 

Erick Erickson commented on SOLR-11035:
---------------------------------------

Now here we have a collossally ugly patch. We've known for a long time that there's a problem here, but we don't have a root cause. Meanwhile, here's a bandaid patch for tests that are sensitive to this issue, with a "fixer uppoer" method cleverly named Solr11035BandAid. DocValuesNotIndexedTest is the one I was working on this weekend and have used it in.

The root problem here is that I can:
> commit some docs synchronously, so when it returns I should have a new searcher that sees them.
> go search for those docs added above and not find them. No matter how long I wait.

So this patch creates a utility function in SolrTestCaseJ4 that we can call for "impossible" failures with this pattern that:
> checks to see whether the counts for the query passed in match the expectation. If it does, return. Otherwise
> indexes a bogus doc (with commit)
> deletes that same doc (with commit)
> checks numFound again and fails if they don't match.

What I can guarantee:

> DocValuesNotIndexedTest would fail about 10-15% of the time with the test case in the comments of the new method in SolrTestCaseJ4
> I see the log messages regularly from the new method, but the test calling it succeeds
> This is not a good fix, but it'll reduce the noise until we figure out a proper fix
> Once the underlying cause is fixed, we can comment out the body this method to see if the problem is really gone. If so, nuke it.

I'll commit this soon, and as other tests come up that have the same pattern we can add the call to the new method. Precommit passes, and all the DocValuesNotIndexedTest tests use it, but no others. I'm calling it SOLR-11035-bandaid.patch to keep it distinct from the _real_ fix.

Callers will have to take some care to know how many docs _should_ be found, which will be trickier when random numbers of docs are indexed. Any test that depends on merging predictably can't use it, etc.

> (at least) 2 distinct failures possible when clients attempt searches during SolrCore reload
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11035
>                 URL: https://issues.apache.org/jira/browse/SOLR-11035
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-11035-bandaid.patch, SOLR-11035.patch, log.txt, log.txt
>
>
> If a SolrCore is reloaded, there are (at least) 2 distinct types of failures that clients may observe when executing updates + queries while the reload is in progress...
> * documents may appear missing during queries
> * queries may fail with "SolrException: openNewSearcher called on closed core"
> Details to follow in comment...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org