You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Erick Erickson (Jira)" <ji...@apache.org> on 2019/10/28 15:05:00 UTC
[jira] [Created] (SOLR-13874) LBSolrClient check for a zombie
replica becoming active can be very expensive
Erick Erickson created SOLR-13874:
-------------------------------------
Summary: LBSolrClient check for a zombie replica becoming active can be very expensive
Key: SOLR-13874
URL: https://issues.apache.org/jira/browse/SOLR-13874
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Erick Erickson
When a replica is in the zombie list, LBSolrClient periodically issues a query:
{quote}q=\*:\*&distrib=false&sort=\_docid\_ asc&rows=0
{quote}
Functionally, this is fine, but I seen a situation where the replica has over 1B documents and this query takes 12+ seconds. And on a large collection, this could be executed quite a few times, at minimum from every Solr instance that has a replica for this collection, I didn't dive too far into whether every replica would issue it or not.
To make matters worse, potentially the replica went away when the system was under stress, and taking a long time to respond to the query that's issued multiple times only adds to the overall stress (although I suppose it might be cached, hmmm... irrelevant to the main point anyway).
We should find a lower-cost way to check whether a replica is alive. [~ab] suggested using {{SegmentsInfoRequestHandler}}, since it's already there and depends on having an open searcher.
I don't think the right fix here is to make a {{rows=0}} query more efficient, although that wouldn't be a _bad_ thing, using {{\*:\*}} just to check whether a searcher is alive is wasteful in and of itself, as well as subject to regressions on the next refactoring.
And while I agree that 1B documents on a single replica is an edge case, it's still something that we shouldn't make more difficult than necessary.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org