You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2015/10/30 23:33:27 UTC

[jira] [Created] (SOLR-8226) Is a SocketTimeoutException really a reliable indicator of a zombie in LBHttpSolrClient?

Timothy Potter created SOLR-8226:
------------------------------------

             Summary: Is a SocketTimeoutException really a reliable indicator of a zombie in LBHttpSolrClient?
                 Key: SOLR-8226
                 URL: https://issues.apache.org/jira/browse/SOLR-8226
             Project: Solr
          Issue Type: Improvement
          Components: SolrJ
            Reporter: Timothy Potter
            Assignee: Timothy Potter


In LBHttpSolrClient, we do:

{code}
 } catch (SocketTimeoutException e) {
395	      if (!isUpdate) {
396	        ex = (!isZombie) ? addZombie(client, e) : e;
397	      } else {
398	        throw e;
399	      }
{code}

If I have a reasonably low socket timeout configured for my HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy replica gets put into the zombie list, and potentially creating a herd effect on my other replicas as there is now one less replica in the rotation. Moreover, HttpShardHandlerFactory does not let me configure the check interval for adding zombies back into rotation, so a potentially healthy replica is out of rotation for a full minute. At the very least, the interval should be configurable for the HttpShardHandlerFactory, but we should also strive to differentiate between a slow response and a true zombie.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org