You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ilan Ginzburg (Jira)" <ji...@apache.org> on 2021/01/23 18:26:00 UTC

[jira] [Commented] (SOLR-14927) Remove Overseer

    [ https://issues.apache.org/jira/browse/SOLR-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270728#comment-17270728 ] 

Ilan Ginzburg commented on SOLR-14927:
--------------------------------------

As I'm working on the child work item for distributing the cluster state updates, I realize that some changes to the Collection API might be required earlier than I hoped.
See [comment on SOLR-14928|https://issues.apache.org/jira/browse/SOLR-14928?focusedCommentId=17270726&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17270726].

> Remove Overseer
> ---------------
>
>                 Key: SOLR-14927
>                 URL: https://issues.apache.org/jira/browse/SOLR-14927
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Ilan Ginzburg
>            Assignee: Ilan Ginzburg
>            Priority: Major
>              Labels: cluster, collection-api, overseer, solrcloud, zookeeper
>
> This Jira is intended to capture sub jiras on the path to remove the Overseer component from SolrCloud and move to all nodes being able to do the work currently done by Overseer.
> See detailed description in [this doc|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/].
> Copying (edited) from the above doc:
> The motivation for removing Overseer include:
>  * Mono threaded state change is slow and doesn’t scale,
>  * Communication between cluster nodes and the Overseer use Zookeeper as a queueing mechanism, this is not a good idea,
>  * Nodes talking to Overseer (then Overseer talking to itself) via Zookeeper is inefficient and adds latency,
>  * Collection API scalability is poor, because not only a single node processes commands for all Collections, but it also depends on the mono threaded state change queue consumption,
>  * The code supporting Overseer in SolrCloud is complex (election, queue management, recovery etc).
> The general idea is that there’s already a central point in the SolrCloud cluster and it’s Zookeeper. It might not be necessary to have a second central point (Overseer) because nodes can interact directly with Zookeeper and synchronize more efficiently by optimistic locking using “conditional updates” (a.k.a compare and swap or CAS).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org