You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2019/04/15 16:47:00 UTC
[jira] [Commented] (SOLR-13405) Support 1 or 0 replicas per shard

    [ https://issues.apache.org/jira/browse/SOLR-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818162#comment-16818162 ] 

Yonik Seeley commented on SOLR-13405:
-------------------------------------

Some design considerations / thoughts:
 - the node/replica should not be marked down in ZK based on client detection... it should only cause a temporary new replica to be quickly brought up for querying.
 - this will have no effect on who is the leader... hence this only helps query side (which is normally much more latency sensitive).
 - overseer should dedup requests since multiple clients detecting a node going down will all request new replicas.
 -- to aid in this deduplication, client should include in its request which replica it detected as down
 - Node vs Core (replica) down detection? To lessen the impact of false down detection, and to speed completion of the current query, only request new replicas for the shards that are being queried (as opposed to all shards on the node that went down)
 - Return to normal state - at some point, we should return to the normal number of replicas.  Use autoscale framework for this?

> Support 1 or 0 replicas per shard
> ---------------------------------
>
>                 Key: SOLR-13405
>                 URL: https://issues.apache.org/jira/browse/SOLR-13405
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yonik Seeley
>            Priority: Major
>
> When multiple replicas per shard are not needed for data durability (because of shared storage support on HDFS or S3, etc), other cluster configurations suddenly make sense like allowing 1 or even 0 replicas per shard (primarily to lower costs.)
> One big issue with a single replica per shard is that zookeeper (and thus the overseer) waits for a session timeout before marking the node as down.  Instead of queries having to wait this long (~30 sec), if a SolrJ query client detects that a node died, it can ask the overseer to quickly bring up another replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org