You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Ramkumar R. Aiyengar" <an...@gmail.com> on 2014/07/15 19:09:26 UTC

Solr checkIfIAmLeader usage from ZK event thread

Currently when a replica is watching the current leader's ephemeral node
and the leader disappears, it runs the leadership check along with its two
way peer sync, ZK update etc. on the ZK event thread where the watch was
fired.

What this means is that for instances with lots of cores, you would be
serializing leadership elections and the last in the list could take a long
time to have a replacement elected (during which you will have no leader).

I did a quick change to make the checkIfIAmLeader call async, but Solr
cloud tests being what they are (thanks Shalin for cleaning them up btw :)
), I wanted to check if I am doing something stupid. If not, I will raise a
JIRA.

One contention could be if you might end up with two elections for the same
shard, but I can't see how that might happen..

Re: Solr checkIfIAmLeader usage from ZK event thread

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.

Opened SOLR-6261
On 19 Jul 2014 20:21, "Mark Miller" <ma...@gmail.com> wrote:

> Put up a patch a lets take a look.
>
> Most anywhere that holds up the zk processing thread for any decent amount
> of time is probably something waiting to be fixed.
>
> --
> Mark Miller
> about.me/markrmiller
>
> On July 15, 2014 at 10:09:56 AM, Ramkumar R. Aiyengar (
> andyetitmoves@gmail.com) wrote:
> > Currently when a replica is watching the current leader's ephemeral node
> > and the leader disappears, it runs the leadership check along with its
> two
> > way peer sync, ZK update etc. on the ZK event thread where the watch was
> > fired.
> >
> > What this means is that for instances with lots of cores, you would be
> > serializing leadership elections and the last in the list could take a
> long
> > time to have a replacement elected (during which you will have no
> leader).
> >
> > I did a quick change to make the checkIfIAmLeader call async, but Solr
> > cloud tests being what they are (thanks Shalin for cleaning them up btw
> :)
> > ), I wanted to check if I am doing something stupid. If not, I will
> raise a
> > JIRA.
> >
> > One contention could be if you might end up with two elections for the
> same
> > shard, but I can't see how that might happen..
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Solr checkIfIAmLeader usage from ZK event thread

Posted by Mark Miller <ma...@gmail.com>.

Put up a patch a lets take a look.

Most anywhere that holds up the zk processing thread for any decent amount of time is probably something waiting to be fixed.

-- 
Mark Miller
about.me/markrmiller

On July 15, 2014 at 10:09:56 AM, Ramkumar R. Aiyengar (andyetitmoves@gmail.com) wrote:
> Currently when a replica is watching the current leader's ephemeral node
> and the leader disappears, it runs the leadership check along with its two
> way peer sync, ZK update etc. on the ZK event thread where the watch was
> fired.
> 
> What this means is that for instances with lots of cores, you would be
> serializing leadership elections and the last in the list could take a long
> time to have a replacement elected (during which you will have no leader).
> 
> I did a quick change to make the checkIfIAmLeader call async, but Solr
> cloud tests being what they are (thanks Shalin for cleaning them up btw :)
> ), I wanted to check if I am doing something stupid. If not, I will raise a
> JIRA.
> 
> One contention could be if you might end up with two elections for the same
> shard, but I can't see how that might happen..
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org