You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Arindam Mukherjee <ar...@gmail.com> on 2013/11/11 17:19:04 UTC

Staggered start of nodes in an ensemble

Of late I have been bombarding this list with questions and am getting
a little frustrated at my lack of understanding of ZooKeeper. I am up
against deadlines that look infeasible now. :(

Anyway, having majorly failed to get going with 3.5.0 (you might see a
question earlier today), I decided, for the purposes of a PoC to
create a static 3.4.6 configuration. Say, create max 3 VMs, of which 2
are always up. Initially 1 and 2 would be up. I would be listing 3
servers in the config - server.1 through server.3.

I brought up the servers on 1 and 2 in that order using "zkServer.sh
start-foreground". However they failed to connect and form a quorum
and the foreground server process showed an increasing timeout. Then I
brought up 3 at which point 2 and 3 were able to connect.

Next I started zkCli.sh from 1 - it failed to connect to any box.
However the ones on 2 and 3 connected neatly. I am at a little loss to
explain this. Need help.

Thanks.
Arindam

Re: Staggered start of nodes in an ensemble

Posted by Arindam Mukherjee <ar...@gmail.com>.
Almost certainly a firewall issue. Flushed iptables and stopped the
service to get things working. Will try dynamic reconfig with 3.5
later.

Thanks guys.

On Tue, Nov 12, 2013 at 10:56 AM, Patrick Hunt <ph...@apache.org> wrote:
> Probably goes without saying but you cleared out the datadir when
> switching from 3.5 to 3.4? I don't think anyone has tested 3.4 code
> reading a datadir generated by a 3.5 server yet....
>
> Patrick
>
> On Mon, Nov 11, 2013 at 9:19 PM, Patrick Hunt <ph...@apache.org> wrote:
>> You are using the same configuration on all three nodes, is that
>> right? Permissions for the dataDir are set properly on all three?
>>
>> Have you verified that the "myid" file is correct, and corresponds
>> correctly, to each of the 3 servers? (.193 should have myid of 1, .194
>> myid of 2, .195 myid of 3)
>>
>> I'd also suggest that you use 20 & 10 as the init and sync limits respectively.
>>
>> Patrick
>>
>> On Mon, Nov 11, 2013 at 9:09 PM, Arindam Mukherjee
>> <ar...@gmail.com> wrote:
>>> On Tue, Nov 12, 2013 at 8:43 AM, kishore g <g....@gmail.com> wrote:
>>>> can you provide the configurations (zoo.cfg) on each server. From the
>>>> description, your server 1 seems to be incorrectly configured.
>>>>
>>>
>>> This is the configuration:
>>>
>>> tickTime=2000
>>> dataDir=/var/zookeeper
>>> initLimit=5
>>> syncLimit=2
>>> clientPort=2181
>>> server.1=172.31.68.193:2888:3888
>>> server.2=172.31.68.194:2888:3888
>>> server.3=172.31.68.195:2888:3888
>>>
>>> Thanks.

Re: Staggered start of nodes in an ensemble

Posted by Patrick Hunt <ph...@apache.org>.
Probably goes without saying but you cleared out the datadir when
switching from 3.5 to 3.4? I don't think anyone has tested 3.4 code
reading a datadir generated by a 3.5 server yet....

Patrick

On Mon, Nov 11, 2013 at 9:19 PM, Patrick Hunt <ph...@apache.org> wrote:
> You are using the same configuration on all three nodes, is that
> right? Permissions for the dataDir are set properly on all three?
>
> Have you verified that the "myid" file is correct, and corresponds
> correctly, to each of the 3 servers? (.193 should have myid of 1, .194
> myid of 2, .195 myid of 3)
>
> I'd also suggest that you use 20 & 10 as the init and sync limits respectively.
>
> Patrick
>
> On Mon, Nov 11, 2013 at 9:09 PM, Arindam Mukherjee
> <ar...@gmail.com> wrote:
>> On Tue, Nov 12, 2013 at 8:43 AM, kishore g <g....@gmail.com> wrote:
>>> can you provide the configurations (zoo.cfg) on each server. From the
>>> description, your server 1 seems to be incorrectly configured.
>>>
>>
>> This is the configuration:
>>
>> tickTime=2000
>> dataDir=/var/zookeeper
>> initLimit=5
>> syncLimit=2
>> clientPort=2181
>> server.1=172.31.68.193:2888:3888
>> server.2=172.31.68.194:2888:3888
>> server.3=172.31.68.195:2888:3888
>>
>> Thanks.

Re: Staggered start of nodes in an ensemble

Posted by Patrick Hunt <ph...@apache.org>.
You are using the same configuration on all three nodes, is that
right? Permissions for the dataDir are set properly on all three?

Have you verified that the "myid" file is correct, and corresponds
correctly, to each of the 3 servers? (.193 should have myid of 1, .194
myid of 2, .195 myid of 3)

I'd also suggest that you use 20 & 10 as the init and sync limits respectively.

Patrick

On Mon, Nov 11, 2013 at 9:09 PM, Arindam Mukherjee
<ar...@gmail.com> wrote:
> On Tue, Nov 12, 2013 at 8:43 AM, kishore g <g....@gmail.com> wrote:
>> can you provide the configurations (zoo.cfg) on each server. From the
>> description, your server 1 seems to be incorrectly configured.
>>
>
> This is the configuration:
>
> tickTime=2000
> dataDir=/var/zookeeper
> initLimit=5
> syncLimit=2
> clientPort=2181
> server.1=172.31.68.193:2888:3888
> server.2=172.31.68.194:2888:3888
> server.3=172.31.68.195:2888:3888
>
> Thanks.

Re: Staggered start of nodes in an ensemble

Posted by Benjamin Reed <br...@apache.org>.
and what did the server logs say. (probably over and over again) when
server 1 and 2 were up?


On Mon, Nov 11, 2013 at 9:09 PM, Arindam Mukherjee <
arindam.mukerjee@gmail.com> wrote:

> On Tue, Nov 12, 2013 at 8:43 AM, kishore g <g....@gmail.com> wrote:
> > can you provide the configurations (zoo.cfg) on each server. From the
> > description, your server 1 seems to be incorrectly configured.
> >
>
> This is the configuration:
>
> tickTime=2000
> dataDir=/var/zookeeper
> initLimit=5
> syncLimit=2
> clientPort=2181
> server.1=172.31.68.193:2888:3888
> server.2=172.31.68.194:2888:3888
> server.3=172.31.68.195:2888:3888
>
> Thanks.
>

Re: Staggered start of nodes in an ensemble

Posted by Arindam Mukherjee <ar...@gmail.com>.
On Tue, Nov 12, 2013 at 8:43 AM, kishore g <g....@gmail.com> wrote:
> can you provide the configurations (zoo.cfg) on each server. From the
> description, your server 1 seems to be incorrectly configured.
>

This is the configuration:

tickTime=2000
dataDir=/var/zookeeper
initLimit=5
syncLimit=2
clientPort=2181
server.1=172.31.68.193:2888:3888
server.2=172.31.68.194:2888:3888
server.3=172.31.68.195:2888:3888

Thanks.

Re: Staggered start of nodes in an ensemble

Posted by Patrick Hunt <ph...@apache.org>.
You might also try this utility for generating the configs:

https://github.com/phunt/zkconf

see the example

$ zkconf.py —servers “host1.com,host2.com,168.1.1.1” ~/zookeeper_trunk
test3servers

you'd probably want similar

Patrick

On Mon, Nov 11, 2013 at 8:45 PM, Patrick Hunt <ph...@apache.org> wrote:
> You might also ensure that the vms can communicate over the ports
> you're using. No firewall say. Have you looked at the logs at all?
> They might shed more light as well.
>
> Patrick
>
> On Mon, Nov 11, 2013 at 7:13 PM, kishore g <g....@gmail.com> wrote:
>> can you provide the configurations (zoo.cfg) on each server. From the
>> description, your server 1 seems to be incorrectly configured.
>>
>> thanks,
>> Kishore G
>>
>>
>> On Mon, Nov 11, 2013 at 8:19 AM, Arindam Mukherjee <
>> arindam.mukerjee@gmail.com> wrote:
>>
>>> Of late I have been bombarding this list with questions and am getting
>>> a little frustrated at my lack of understanding of ZooKeeper. I am up
>>> against deadlines that look infeasible now. :(
>>>
>>> Anyway, having majorly failed to get going with 3.5.0 (you might see a
>>> question earlier today), I decided, for the purposes of a PoC to
>>> create a static 3.4.6 configuration. Say, create max 3 VMs, of which 2
>>> are always up. Initially 1 and 2 would be up. I would be listing 3
>>> servers in the config - server.1 through server.3.
>>>
>>> I brought up the servers on 1 and 2 in that order using "zkServer.sh
>>> start-foreground". However they failed to connect and form a quorum
>>> and the foreground server process showed an increasing timeout. Then I
>>> brought up 3 at which point 2 and 3 were able to connect.
>>>
>>> Next I started zkCli.sh from 1 - it failed to connect to any box.
>>> However the ones on 2 and 3 connected neatly. I am at a little loss to
>>> explain this. Need help.
>>>
>>> Thanks.
>>> Arindam
>>>

Re: Staggered start of nodes in an ensemble

Posted by Patrick Hunt <ph...@apache.org>.
You might also ensure that the vms can communicate over the ports
you're using. No firewall say. Have you looked at the logs at all?
They might shed more light as well.

Patrick

On Mon, Nov 11, 2013 at 7:13 PM, kishore g <g....@gmail.com> wrote:
> can you provide the configurations (zoo.cfg) on each server. From the
> description, your server 1 seems to be incorrectly configured.
>
> thanks,
> Kishore G
>
>
> On Mon, Nov 11, 2013 at 8:19 AM, Arindam Mukherjee <
> arindam.mukerjee@gmail.com> wrote:
>
>> Of late I have been bombarding this list with questions and am getting
>> a little frustrated at my lack of understanding of ZooKeeper. I am up
>> against deadlines that look infeasible now. :(
>>
>> Anyway, having majorly failed to get going with 3.5.0 (you might see a
>> question earlier today), I decided, for the purposes of a PoC to
>> create a static 3.4.6 configuration. Say, create max 3 VMs, of which 2
>> are always up. Initially 1 and 2 would be up. I would be listing 3
>> servers in the config - server.1 through server.3.
>>
>> I brought up the servers on 1 and 2 in that order using "zkServer.sh
>> start-foreground". However they failed to connect and form a quorum
>> and the foreground server process showed an increasing timeout. Then I
>> brought up 3 at which point 2 and 3 were able to connect.
>>
>> Next I started zkCli.sh from 1 - it failed to connect to any box.
>> However the ones on 2 and 3 connected neatly. I am at a little loss to
>> explain this. Need help.
>>
>> Thanks.
>> Arindam
>>

Re: Staggered start of nodes in an ensemble

Posted by kishore g <g....@gmail.com>.
can you provide the configurations (zoo.cfg) on each server. From the
description, your server 1 seems to be incorrectly configured.

thanks,
Kishore G


On Mon, Nov 11, 2013 at 8:19 AM, Arindam Mukherjee <
arindam.mukerjee@gmail.com> wrote:

> Of late I have been bombarding this list with questions and am getting
> a little frustrated at my lack of understanding of ZooKeeper. I am up
> against deadlines that look infeasible now. :(
>
> Anyway, having majorly failed to get going with 3.5.0 (you might see a
> question earlier today), I decided, for the purposes of a PoC to
> create a static 3.4.6 configuration. Say, create max 3 VMs, of which 2
> are always up. Initially 1 and 2 would be up. I would be listing 3
> servers in the config - server.1 through server.3.
>
> I brought up the servers on 1 and 2 in that order using "zkServer.sh
> start-foreground". However they failed to connect and form a quorum
> and the foreground server process showed an increasing timeout. Then I
> brought up 3 at which point 2 and 3 were able to connect.
>
> Next I started zkCli.sh from 1 - it failed to connect to any box.
> However the ones on 2 and 3 connected neatly. I am at a little loss to
> explain this. Need help.
>
> Thanks.
> Arindam
>