You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2015/10/30 23:33:27 UTC
[jira] [Created] (SOLR-8226) Is a SocketTimeoutException really a
reliable indicator of a zombie in LBHttpSolrClient?
Timothy Potter created SOLR-8226:
------------------------------------
Summary: Is a SocketTimeoutException really a reliable indicator of a zombie in LBHttpSolrClient?
Key: SOLR-8226
URL: https://issues.apache.org/jira/browse/SOLR-8226
Project: Solr
Issue Type: Improvement
Components: SolrJ
Reporter: Timothy Potter
Assignee: Timothy Potter
In LBHttpSolrClient, we do:
{code}
} catch (SocketTimeoutException e) {
395 if (!isUpdate) {
396 ex = (!isZombie) ? addZombie(client, e) : e;
397 } else {
398 throw e;
399 }
{code}
If I have a reasonably low socket timeout configured for my HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy replica gets put into the zombie list, and potentially creating a herd effect on my other replicas as there is now one less replica in the rotation. Moreover, HttpShardHandlerFactory does not let me configure the check interval for adding zombies back into rotation, so a potentially healthy replica is out of rotation for a full minute. At the very least, the interval should be configurable for the HttpShardHandlerFactory, but we should also strive to differentiate between a slow response and a true zombie.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org