You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by David Smiley <ds...@apache.org> on 2022/10/11 22:34:38 UTC

JIT Shard leader design/proposal

At work, I’ve attempted to troubleshoot issues relating to shard
leadership.  It’s quite possible that the root-causes may be related to
customizations in my fork of Solr; who knows.  The leadership
code/algorithm is so hard to debug/troubleshoot that it’s hard to say.
It’s no secret that Solr’s code here is a complicated puzzle[1].  Out of
this frustration, I began to ponder a fantasy of how I want leader election
to work, informed by my desire to scale to massive numbers of collections &
shards on a cluster.  Using Curator for elections would perhaps address
stability but not this scale.  I’d like to get input from you all on this
fantasy.  Surely I have overlooked things; please offer your insights!

Thematic concept:  Don’t change/elect leaders until it’s actually
necessary.  In most cases where I work, the leader will return before we
truly need a leader.  Even when not true, I don’t think doing it lazily
should be a noticeable issue?  If so, it’s easy to imagine augmenting this
design with an optional eager leadership election.

A. Only code paths that truly need a leader will do “leadership checks”,
resulting in a potential leader election.  This is principally on indexing
in DistributedZkUpdateProcessor but there are likely more spots.

B. Leader check: Check if the shard’s leader is (a) known, and (b)
state=ACTIVE, and (c) on a “live” node, and (d) the preferredLeader
condition is satisfied.  Otherwise, try to elect a leader in a loop until
this set of conditions is achieved or a timeout is reached.
B.A: The preferredLeader condition means that either the leader is marked
as preferredLeader, or no replica with preferredLeader is eligible for
leadership.

C. “Try to elect a leader”:   (The word “election” might not be the best
word for this algorithm, but whatever).
C.1.: A replica must be eligible to be a leader.  It must be live (on a
live node) and have an ACTIVE state.  And, very important, eligibility
should be governed by ZkShardTerms which knows which replicas have the most
up-to-date state.
C.1.A: Strict use of ZkShardTerms is designed to ensure that there is no
data loss.  That said “forceLeader” remains in the toolbox of Solr admins
(which monkey’s with ZkShardTerms to cheat).  We may need a new optional
mechanisms to be closer to what we have today — to basically ignore
ZkShardTerms after a configured period of time?
C.1.B. I am assuming that replicas will become eligible on their own (e.g.
as nodes re-join) instead of this algorithm needing to initiate/tell any to
get into this state somehow.
C.2: If there are no leader-eligible replicas, complain with useful
information to diagnose why no leader was found.  Don’t log this if we
already logged this same message in our leadership check loop.  Sleep
perhaps 1000ms and try the loop again.  If we can wait/monitor on the state
of something convenient then do that to avoid sleeping for too long.
C.3: Of the leader-eligible replicas — pick whichever one as the leader
(e.g. random).  Prefer preferredLeader=true replicas, of course.  ZK will
solve races if this algorithm runs on more than one node.

D. Only track leadership in Slice (i.e. within the cluster state) which is
backed by one spot in ZK.  Don’t put it in places like CloudDescriptor or
other places in ZK.

Thoughts?


[1]
https://lists.apache.org/list?dev@solr.apache.org:2021-10:MILLER%20leader
“ZkCmdExecutor” thread with Mark Miller, and referencing
https://www.solrdev.io/leader-election-adventure.html which no longer
resolves


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Re: JIT Shard leader design/proposal

Posted by Jan Høydahl <ja...@cominvent.com>.

Agree too,

Are you trying to optimize for nodes in a large cluster going down and coming up again within a short time period withuot another replica being elected leader?

I'd prefer if Solr any rewrite would use proven recipies from e.g. Curator instead of rolling our own.
If we have complex election rules, then perhaps we could have several election groups, so first try to elect from nodes in preferred leaders group, then if that fails, elect between remaining eligible nodes.

Jan

> 14. okt. 2022 kl. 12:21 skrev Noble Paul <no...@gmail.com>:
> 
> "just in time is probably less than ideal for most of the more common uses
> cases."
> 
> Agree
> 
> On Fri, Oct 14, 2022, 9:11 PM Mark Miller <ma...@gmail.com> wrote:
> 
>> I don’t have much to say about the proposal, other than to say that if an
>> election ever ends up involving syncing up and exchanging data, doing that
>> just in time is probably less than ideal for most of the more common uses
>> cases.
>> 
>> That’s just an aside though. Id be more interested in seeing the proposal
>> connect problems with solutions. My quick read makes me think the goal is
>> some dimension of scale (I’m guessing a lazy dimension, usually no the most
>> common Solr architecture in my experience fwiw). But I don’t see what the
>> problems are for that dimension of scale or how to connect proposals to
>> solutions to the problems. Unless I’m just missing it.
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org