You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Erick Erickson (Jira)" <ji...@apache.org> on 2019/10/28 15:05:00 UTC

[jira] [Created] (SOLR-13874) LBSolrClient check for a zombie replica becoming active can be very expensive

Erick Erickson created SOLR-13874:
-------------------------------------

             Summary: LBSolrClient check for a zombie replica becoming active can be very expensive
                 Key: SOLR-13874
                 URL: https://issues.apache.org/jira/browse/SOLR-13874
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Erick Erickson


When a replica is in the zombie list, LBSolrClient periodically issues a query:
{quote}q=\*:\*&distrib=false&sort=\_docid\_ asc&rows=0
{quote}
Functionally, this is fine, but I seen a situation where the replica has over 1B documents and this query takes 12+ seconds. And on a large collection, this could be executed quite a few times, at minimum from every Solr instance that has a replica for this collection, I didn't dive too far into whether every replica would issue it or not.

To make matters worse, potentially the replica went away when the system was under stress, and taking a long time to respond to the query that's issued multiple times only adds to the overall stress (although I suppose it might be cached, hmmm... irrelevant to the main point anyway).

We should find a lower-cost way to check whether a replica is alive. [~ab] suggested using {{SegmentsInfoRequestHandler}}, since it's already there and depends on having an open searcher.

I don't think the right fix here is to make a {{rows=0}} query more efficient, although that wouldn't be a _bad_ thing, using {{\*:\*}} just to check whether a searcher is alive is wasteful in and of itself, as well as subject to regressions on the next refactoring.

And while I agree that 1B documents on a single replica is an edge case, it's still something that we shouldn't make more difficult than necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org