You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aman Tandon <am...@gmail.com> on 2015/03/03 12:26:07 UTC

Help needed to understand zookeeper in solrcloud

Hi,

I read in various blogs that we should use the odd number of zookeeper in
the ensemble, So why is it so?

With Regards
Aman Tandon

Re: Help needed to understand zookeeper in solrcloud

Posted by Aman Tandon <am...@gmail.com>.
Thanks svante for clearing the doubt.

With Regards
Aman Tandon

On Thu, Mar 5, 2015 at 2:15 PM, svante karlsson <sa...@csi.se> wrote:

> The network will "only" split if you get errors on your network hardware.
> (or fiddle with iptables) Let's say you placed your zookeepers in separate
> racks and someone pulls network cable between them - that will leave you
> with 5 working servers but they can't reach each other. This is "split
> brain scenario".
>
> >Are they guaranteed to split 4/0
> Yes. A node failure will not partition the network.
>
> > any odd number - it could be 21 even
> Since all write a synchronous you don't want to use a too large number of
> zookeepers since that would slow down the cluster. Use a reasonable number
> to reach your SLA. (3 or 5 are common choices)
>
> >and from a single failure you drop to an even number - then there is the
> danger of NOT getting quorum.
> No, se above.
>
> BUT, if you first lose most of  your nodes due to a network partition and
> then lose another due to node failure - then you are out of quorum.
>
>
> /svante
>
>
>
> 2015-03-05 9:29 GMT+01:00 Julian Perry <ju...@limitless.co.uk>:
>
> >
> > I start out with 5 zk's.  All good.
> >
> > One zk fails - I'm left with four.  Are they guaranteed
> > to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
> > right?
> >
> > Surely to start with 5 zk's (or in fact any odd number - it
> > could be 21 even), and from a single failure you drop to an
> > even number - then there is the danger of NOT getting quorum.
> >
> > So ... I can only assume that there is a mechanism in place
> > inside zk to guarantee this cannot happen, right?
> >
> > --
> > Cheers
> > Jules.
> >
> >
> >
> > On 05/03/2015 06:47, svante karlsson wrote:
> >
> >> Yes, as long as it is three (the majority of 5) or more.
> >>
> >> This is why there is no point of having a 4 node cluster. This would
> also
> >> require 3 nodes for majority thus giving it the fault tolerance of a 3
> >> node
> >> cluster but slower and more expensive.
> >>
> >>
> >>
> >> 2015-03-05 7:41 GMT+01:00 Aman Tandon <am...@gmail.com>:
> >>
> >>  Thanks svante.
> >>>
> >>> What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
> >>> zookeeper election can occur with 4 / even number of zookeepers alive?
> >>>
> >>> With Regards
> >>> Aman Tandon
> >>>
> >>> On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson <sa...@csi.se> wrote:
> >>>
> >>>  synchronous update of state and a requirement of more than half the
> >>>> zookeepers alive (and in sync) this makes it impossible to have a
> "split
> >>>> brain" situation ie when you partition a network and get let's say 3
> >>>>
> >>> alive
> >>>
> >>>> on one side and 2 on the other.
> >>>>
> >>>> In this case the 2 node networks stops serving request since it's not
> in
> >>>> majority.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:
> >>>>
> >>>>  But how they handle the failure?
> >>>>>
> >>>>> With Regards
> >>>>> Aman Tandon
> >>>>>
> >>>>> On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
> >>>>>
> >>>>>  Zookeeper requires a majority of servers to be available. For
> >>>>>>
> >>>>> example:
> >>>
> >>>> Five
> >>>>>
> >>>>>> machines ZooKeeper can handle the failure of two machines. That's
> why
> >>>>>>
> >>>>> odd
> >>>>
> >>>>> numbers are recommended.
> >>>>>>
> >>>>>
>

Re: Help needed to understand zookeeper in solrcloud

Posted by svante karlsson <sa...@csi.se>.
The network will "only" split if you get errors on your network hardware.
(or fiddle with iptables) Let's say you placed your zookeepers in separate
racks and someone pulls network cable between them - that will leave you
with 5 working servers but they can't reach each other. This is "split
brain scenario".

>Are they guaranteed to split 4/0
Yes. A node failure will not partition the network.

> any odd number - it could be 21 even
Since all write a synchronous you don't want to use a too large number of
zookeepers since that would slow down the cluster. Use a reasonable number
to reach your SLA. (3 or 5 are common choices)

>and from a single failure you drop to an even number - then there is the
danger of NOT getting quorum.
No, se above.

BUT, if you first lose most of  your nodes due to a network partition and
then lose another due to node failure - then you are out of quorum.


/svante



2015-03-05 9:29 GMT+01:00 Julian Perry <ju...@limitless.co.uk>:

>
> I start out with 5 zk's.  All good.
>
> One zk fails - I'm left with four.  Are they guaranteed
> to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
> right?
>
> Surely to start with 5 zk's (or in fact any odd number - it
> could be 21 even), and from a single failure you drop to an
> even number - then there is the danger of NOT getting quorum.
>
> So ... I can only assume that there is a mechanism in place
> inside zk to guarantee this cannot happen, right?
>
> --
> Cheers
> Jules.
>
>
>
> On 05/03/2015 06:47, svante karlsson wrote:
>
>> Yes, as long as it is three (the majority of 5) or more.
>>
>> This is why there is no point of having a 4 node cluster. This would also
>> require 3 nodes for majority thus giving it the fault tolerance of a 3
>> node
>> cluster but slower and more expensive.
>>
>>
>>
>> 2015-03-05 7:41 GMT+01:00 Aman Tandon <am...@gmail.com>:
>>
>>  Thanks svante.
>>>
>>> What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
>>> zookeeper election can occur with 4 / even number of zookeepers alive?
>>>
>>> With Regards
>>> Aman Tandon
>>>
>>> On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson <sa...@csi.se> wrote:
>>>
>>>  synchronous update of state and a requirement of more than half the
>>>> zookeepers alive (and in sync) this makes it impossible to have a "split
>>>> brain" situation ie when you partition a network and get let's say 3
>>>>
>>> alive
>>>
>>>> on one side and 2 on the other.
>>>>
>>>> In this case the 2 node networks stops serving request since it's not in
>>>> majority.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:
>>>>
>>>>  But how they handle the failure?
>>>>>
>>>>> With Regards
>>>>> Aman Tandon
>>>>>
>>>>> On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
>>>>>
>>>>>  Zookeeper requires a majority of servers to be available. For
>>>>>>
>>>>> example:
>>>
>>>> Five
>>>>>
>>>>>> machines ZooKeeper can handle the failure of two machines. That's why
>>>>>>
>>>>> odd
>>>>
>>>>> numbers are recommended.
>>>>>>
>>>>>

Re: Help needed to understand zookeeper in solrcloud

Posted by Julian Perry <ju...@limitless.co.uk>.
I start out with 5 zk's.  All good.

One zk fails - I'm left with four.  Are they guaranteed
to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
right?

Surely to start with 5 zk's (or in fact any odd number - it
could be 21 even), and from a single failure you drop to an
even number - then there is the danger of NOT getting quorum.

So ... I can only assume that there is a mechanism in place
inside zk to guarantee this cannot happen, right?

--
Cheers
Jules.


On 05/03/2015 06:47, svante karlsson wrote:
> Yes, as long as it is three (the majority of 5) or more.
>
> This is why there is no point of having a 4 node cluster. This would also
> require 3 nodes for majority thus giving it the fault tolerance of a 3 node
> cluster but slower and more expensive.
>
>
>
> 2015-03-05 7:41 GMT+01:00 Aman Tandon <am...@gmail.com>:
>
>> Thanks svante.
>>
>> What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
>> zookeeper election can occur with 4 / even number of zookeepers alive?
>>
>> With Regards
>> Aman Tandon
>>
>> On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson <sa...@csi.se> wrote:
>>
>>> synchronous update of state and a requirement of more than half the
>>> zookeepers alive (and in sync) this makes it impossible to have a "split
>>> brain" situation ie when you partition a network and get let's say 3
>> alive
>>> on one side and 2 on the other.
>>>
>>> In this case the 2 node networks stops serving request since it's not in
>>> majority.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:
>>>
>>>> But how they handle the failure?
>>>>
>>>> With Regards
>>>> Aman Tandon
>>>>
>>>> On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
>>>>
>>>>> Zookeeper requires a majority of servers to be available. For
>> example:
>>>> Five
>>>>> machines ZooKeeper can handle the failure of two machines. That's why
>>> odd
>>>>> numbers are recommended.

Re: Help needed to understand zookeeper in solrcloud

Posted by svante karlsson <sa...@csi.se>.
Yes, as long as it is three (the majority of 5) or more.

This is why there is no point of having a 4 node cluster. This would also
require 3 nodes for majority thus giving it the fault tolerance of a 3 node
cluster but slower and more expensive.



2015-03-05 7:41 GMT+01:00 Aman Tandon <am...@gmail.com>:

> Thanks svante.
>
> What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
> zookeeper election can occur with 4 / even number of zookeepers alive?
>
> With Regards
> Aman Tandon
>
> On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson <sa...@csi.se> wrote:
>
> > synchronous update of state and a requirement of more than half the
> > zookeepers alive (and in sync) this makes it impossible to have a "split
> > brain" situation ie when you partition a network and get let's say 3
> alive
> > on one side and 2 on the other.
> >
> > In this case the 2 node networks stops serving request since it's not in
> > majority.
> >
> >
> >
> >
> >
> >
> >
> >
> > 2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:
> >
> > > But how they handle the failure?
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
> > >
> > > > Zookeeper requires a majority of servers to be available. For
> example:
> > > Five
> > > > machines ZooKeeper can handle the failure of two machines. That's why
> > odd
> > > > numbers are recommended.
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Help-needed-to-understand-zookeeper-in-solrcloud-tp4190631p4190633.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> >
>

Re: Help needed to understand zookeeper in solrcloud

Posted by Aman Tandon <am...@gmail.com>.
Thanks svante.

What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
zookeeper election can occur with 4 / even number of zookeepers alive?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson <sa...@csi.se> wrote:

> synchronous update of state and a requirement of more than half the
> zookeepers alive (and in sync) this makes it impossible to have a "split
> brain" situation ie when you partition a network and get let's say 3 alive
> on one side and 2 on the other.
>
> In this case the 2 node networks stops serving request since it's not in
> majority.
>
>
>
>
>
>
>
>
> 2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:
>
> > But how they handle the failure?
> >
> > With Regards
> > Aman Tandon
> >
> > On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
> >
> > > Zookeeper requires a majority of servers to be available. For example:
> > Five
> > > machines ZooKeeper can handle the failure of two machines. That's why
> odd
> > > numbers are recommended.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Help-needed-to-understand-zookeeper-in-solrcloud-tp4190631p4190633.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Help needed to understand zookeeper in solrcloud

Posted by svante karlsson <sa...@csi.se>.
synchronous update of state and a requirement of more than half the
zookeepers alive (and in sync) this makes it impossible to have a "split
brain" situation ie when you partition a network and get let's say 3 alive
on one side and 2 on the other.

In this case the 2 node networks stops serving request since it's not in
majority.








2015-03-03 13:15 GMT+01:00 Aman Tandon <am...@gmail.com>:

> But how they handle the failure?
>
> With Regards
> Aman Tandon
>
> On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:
>
> > Zookeeper requires a majority of servers to be available. For example:
> Five
> > machines ZooKeeper can handle the failure of two machines. That's why odd
> > numbers are recommended.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Help-needed-to-understand-zookeeper-in-solrcloud-tp4190631p4190633.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Help needed to understand zookeeper in solrcloud

Posted by Aman Tandon <am...@gmail.com>.
But how they handle the failure?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 5:17 PM, O. Klein <kl...@octoweb.nl> wrote:

> Zookeeper requires a majority of servers to be available. For example: Five
> machines ZooKeeper can handle the failure of two machines. That's why odd
> numbers are recommended.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Help-needed-to-understand-zookeeper-in-solrcloud-tp4190631p4190633.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Help needed to understand zookeeper in solrcloud

Posted by "O. Klein" <kl...@octoweb.nl>.
Zookeeper requires a majority of servers to be available. For example: Five
machines ZooKeeper can handle the failure of two machines. That's why odd
numbers are recommended.



--
View this message in context: http://lucene.472066.n3.nabble.com/Help-needed-to-understand-zookeeper-in-solrcloud-tp4190631p4190633.html
Sent from the Solr - User mailing list archive at Nabble.com.