You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "alessandro.benedetti" <a....@sease.io> on 2017/07/14 15:11:57 UTC

Apache Solr 4.10.x - Collection Reload times out

I have been recently facing an issue with the Collection Reload in a couple
of Solr Cloud clusters :

1) re-index a collection
2) collection happily working
3) trigger collection reload 
4) reload times out ( silently, no message in any of the Solr node logs)
5) no effect on the collection ( it still serves query)

If I restart, the collection doesn't start as it finds the write.lock in the
index.
Sometimes this even avoid the entire cluster to be restarted ( even if the
clusterstate.json actually shows only few collection down) and Solr is not
reachable.
Of course i can mitigate the problem just cleaning up the indexes and
restart (avoiding the reload in favor of just restarts in the future), but
this is annoying.

I index through the DIH and I use a DirectSolrSpellChecker .
Should I take a look into Zookeeper ? I tried to check the Overseer queues
and some other checks, not sure the best places to look though in there...

Could this be related ?[1] I don't think so, but I am a bit puzzled...

[1] https://issues.apache.org/jira/browse/SOLR-6246






-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by "alessandro.benedetti" <a....@sease.io>.
I finally have an explanation, I post it here for future reference :

The cause was a combination of :

1) /select request handler has default with the spellcheck ON and few
spellcheck options ( such as collationQuery ON and max collation tries set
to 5)

2) the firstSearcher has a warm-up query with a lot of terms

Basically when opening the searcher, I found that there was a thread stuck
in waiting and that thread was the one responsible for the collation query.
Basically the Searcher was never finishing to be opened, because of the
collation to be calculated over the big multi term warm-up query.

Lesson Learned : be careful with defaults in the default request handler, as
they may be used by other components ( then just user searches)

Thanks for the support!

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by "alessandro.benedetti" <a....@sease.io>.
1) nope, no big tlog or replaying problem

2) Solr just seem freezed. Not responsive and nothing in the log.
Now I just tried just to restart after the Zookeeper config deploy and on
restart the log complety freezes and the instances don't come up...
If I clean the indexes and then start, this works.
Solr is deployed in Jboss, so I don't know if the stop is too aggressive and
breaks something.

3) No problem at all!

I will continue with some analysis.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4347347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by Erick Erickson <er...@gmail.com>.
1> are you replaying the tlog? If you have a large tlog for some
reason you may be replaying it. Although a reload should do a commit
first.

2> What do the Solr logs show the node in question to be doing?

3> Sorry to mislead you, async is not a 4.10 option for the RELOAD
command so that was bogus on my part, that support was added later.

Best,
Erick


On Thu, Jul 20, 2017 at 4:38 AM, alessandro.benedetti
<a....@sease.io> wrote:
> Additional information :
> Try single core reload I identified that an entire shard is not reloading (
> while the other shard is ).
> Taking a look to the "not reloading" shard ( 2 replicas) , it seems that the
> core reload stucks here :
>
> org.apache.solr.core.SolrCores#waitAddPendingCoreOps
>
> The problem is that the wait seems to continue indefinitely and silently.
> Apart a restart, is there any way to clean up the pending core operations ?
> I will continue my investigations
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4346966.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by "alessandro.benedetti" <a....@sease.io>.
Additional information :
Try single core reload I identified that an entire shard is not reloading (
while the other shard is ).
Taking a look to the "not reloading" shard ( 2 replicas) , it seems that the
core reload stucks here :

org.apache.solr.core.SolrCores#waitAddPendingCoreOps

The problem is that the wait seems to continue indefinitely and silently.
Apart a restart, is there any way to clean up the pending core operations ?
I will continue my investigations




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4346966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by "alessandro.benedetti" <a....@sease.io>.
Taking a look to 4.10.2 source I may see why the async call does not work :

    /log.info("Reloading Collection : " + req.getParamString());
    String name = req.getParams().required().get("name");
    
    *ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION,
        OverseerCollectionProcessor.RELOADCOLLECTION, "name", name);*

    handleResponse(OverseerCollectionProcessor.RELOADCOLLECTION, m, rsp);
/

Are we sure we are actually passing the "async" param as a ZkNodeProp ?
Because the handleResponse does :

private void handleResponse(String operation, *ZkNodeProps m*,
      SolrQueryResponse rsp, long timeout)
...
if(m.containsKey(ASYNC) && m.get(ASYNC) != null) {
 
       String asyncId = m.getStr(ASYNC);
...



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4346949.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by "alessandro.benedetti" <a....@sease.io>.
Thanks for the prompt response Erick,
the reason that I am issuing a Collection reload is because I modify from
time to the time the Solrconfig for example, with different spellcheck and
request parameter default params.
So after the upload to Zookeeper I reload the collection to reflect the
modification.
Aliasing is definitely a valid option but at the moment I don't have set up
the infrastructure necessary to programmatically operate that.

Returning to my issue, I see no effect at all if I try to run the request
async ( it seems like it is completely ignoring the parameter) .

http://blabla:8983/solr/admin/collections?action=RELOAD&name=news&async=55

I checked the source code and the async param seems to be in 4.10.2 version,
so this is really weird.
I will proceed with my investigations.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4346940.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr 4.10.x - Collection Reload times out

Posted by Erick Erickson <er...@gmail.com>.
I doubt SOLR-6246 is related, DirectSolrSpellChecker just looks in the
index using (on a quick scan) IndexReader which doesn't hold a lock
IIUC so it shouldn't leave anything around. Additionally, there is no
real "build" step since it's looking at the index rather than creating
a new one as AnalyzingInfixSuggester does. The write lock in that JIRA
was for the "sidecar" index that AnalyzingInfixSuggester created.

Which doesn't help your original issue. Have you tried specifying the
"async" parameter when you issue the RELOAD command then checking the
status with REQUESTSTATUS? I'm wondering if you restart your cluster
_after_ the reload is successfully completed whether you'd have the
same problem. Or whether you'd get some more helpful information if
the request actually fails somehow.

Also, why issue a reload? If you're re-indexing in the background and
want to atomically switch you could use collection aliasing (obviously
you'd need more disk space/resources which may make it not a viable
option). It looks like
> alias points to C1
> create C2 (or delete all data in an existing C2)
> index to C2
> check C2
> point alias to C2

Next time of course you index to C1 and switch the alias to C1 when
you're happy with it.

But even if you do the alias thing it'd still be good to see if we can
figure out what's going on because on the surface what you're
describing should be OK.

Best,
Erick

On Fri, Jul 14, 2017 at 8:11 AM, alessandro.benedetti
<a....@sease.io> wrote:
> I have been recently facing an issue with the Collection Reload in a couple
> of Solr Cloud clusters :
>
> 1) re-index a collection
> 2) collection happily working
> 3) trigger collection reload
> 4) reload times out ( silently, no message in any of the Solr node logs)
> 5) no effect on the collection ( it still serves query)
>
> If I restart, the collection doesn't start as it finds the write.lock in the
> index.
> Sometimes this even avoid the entire cluster to be restarted ( even if the
> clusterstate.json actually shows only few collection down) and Solr is not
> reachable.
> Of course i can mitigate the problem just cleaning up the indexes and
> restart (avoiding the reload in favor of just restarts in the future), but
> this is annoying.
>
> I index through the DIH and I use a DirectSolrSpellChecker .
> Should I take a look into Zookeeper ? I tried to check the Overseer queues
> and some other checks, not sure the best places to look though in there...
>
> Could this be related ?[1] I don't think so, but I am a bit puzzled...
>
> [1] https://issues.apache.org/jira/browse/SOLR-6246
>
>
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075.html
> Sent from the Solr - User mailing list archive at Nabble.com.