You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Avinash P <ma...@yahoo.com> on 2013/10/16 08:01:07 UTC

Problem with leader election

Hello there!

We're seeing a specific problem on one of our Zookeeper clusters. When nodes come up, they try the leader election process, fail and then fall back to standalone mode. This is happening on all the nodes of this specific cluster which was perfectly functional until probably yesterday.

So here are the settings:
ClientPort: 2181
ConnectPort:2888
ElectionPort:3888

We had a functioning quorum of 5 zookeeper nodes, all of which are now running in standalone mode. Just to be sure that there wasn't any firewall issue I used nc -l 3888 on one of the servers and tried connecting from the other and that just works fine.

One problem that I see is when a Zookeeper cluster comes it only listens on port 3888 for a short duration. After which a telnet to port 3888 returns connection refused (nothing's running on port 3888 as confirmed by sudo netstat).

Logs show a lot of java.net.SocketTimeoutException: connect timed out



And all nodes seem to run in standalone mode

Thanks in advance for replying

Regards
Avinash

Re: RE: Problem with leader election

Posted by Avinash P <ma...@yahoo.com>.
Yes I see it in the logs. The difference between a healthy cluster and this one seems to be that in the healthy one zookeeper is listening on port 3888


RE: Problem with leader election

Posted by Flavio Junqueira <fp...@yahoo.com>.
You say that leader election fails. Does it mean that it even runs? Can you
see it in the logs?

I'm not sure I'm missing anything, but I don't think that servers are
supposed to go into standalone mode.

-Flavio

-----Original Message-----
From: Avinash P [mailto:mail.avinashkp@yahoo.com] 
Sent: 16 October 2013 07:22
To: user@zookeeper.apache.org; Avinash P
Subject: Re: Problem with leader election

Additionally they are responsive to operation both read and write. I've
scaled down the ensemble to two nodes to isolate the problem



On Tuesday, October 15, 2013 11:17 PM, Avinash P <ma...@yahoo.com>
wrote:
 
Hi Flavio,

Thank you for your quick response. We're running on 3.4.5 and here's the
configuration
server.1=<hostname-1>\:2888\:3888
clientPort=2181
dataDir=/mnt/data/zookeeper
syncLimit=5
server.2=<hostname-2>\:2888\:3888
tickTime=2000
initLimit=10
dataLogDir=/mnt/data/zookeeper

Please let me know if you'd like any more information

Regards
Avinash




On Tuesday, October 15, 2013 11:07 PM, Flavio Junqueira
<fp...@yahoo.com> wrote:

Avinash,

Which version are you using? When you say that they are falling back to
standalone mode, are you saying that they are responsive to operations? Are
they in read-only mode?

Also, would you be ok with posting the whole configuration?

Thanks,
-Flavio


-----Original Message-----
From: Avinash P [mailto:mail.avinashkp@yahoo.com]
Sent: 16 October 2013 07:01
To: user@zookeeper.apache.org
Subject: Problem with leader election

Hello there!

We're seeing a specific problem on one of our Zookeeper clusters. When nodes
come up, they try the leader election process, fail and then fall back to
standalone mode. This is happening on all the nodes of this specific cluster
which was perfectly functional until probably yesterday.

So here are the settings:
ClientPort: 2181
ConnectPort:2888
ElectionPort:3888

We had a functioning quorum of 5 zookeeper nodes, all of which are now
running in standalone mode. Just to be sure that there wasn't any firewall
issue I used nc -l 3888 on one of the servers and tried connecting from the
other and that just works fine.

One problem that I see is when a Zookeeper cluster comes it only listens on
port 3888 for a short duration. After which a telnet to port 3888 returns
connection refused (nothing's running on port 3888 as confirmed by sudo
netstat).

Logs show a lot of java.net.SocketTimeoutException: connect timed out



And all nodes seem to run in standalone mode

Thanks in advance for replying

Regards
Avinash


Re: Problem with leader election

Posted by Avinash P <ma...@yahoo.com>.
Additionally they are responsive to operation both read and write. I've scaled down the ensemble to two nodes to isolate the problem



On Tuesday, October 15, 2013 11:17 PM, Avinash P <ma...@yahoo.com> wrote:
 
Hi Flavio,

Thank you for your quick response. We're running on 3.4.5 and here's the configuration 
server.1=<hostname-1>\:2888\:3888
clientPort=2181
dataDir=/mnt/data/zookeeper
syncLimit=5
server.2=<hostname-2>\:2888\:3888
tickTime=2000
initLimit=10
dataLogDir=/mnt/data/zookeeper

Please let me know if you'd like any more information

Regards
Avinash




On Tuesday, October 15, 2013 11:07 PM, Flavio Junqueira <fp...@yahoo.com> wrote:

Avinash,

Which version are you using? When you say that they are falling back to
standalone mode, are you saying that they are responsive to operations? Are
they in read-only mode?

Also, would you be ok with posting the whole configuration?

Thanks,
-Flavio


-----Original Message-----
From: Avinash P [mailto:mail.avinashkp@yahoo.com] 
Sent: 16 October 2013 07:01
To: user@zookeeper.apache.org
Subject: Problem with leader election

Hello there!

We're seeing a specific problem on one of our Zookeeper clusters. When nodes
come up, they try the leader election process, fail and then fall back to
standalone mode. This is happening on all the nodes of this specific cluster
which was perfectly functional until probably yesterday.

So here are the settings:
ClientPort: 2181
ConnectPort:2888
ElectionPort:3888

We had a functioning quorum of 5 zookeeper nodes, all of which are now
running in standalone mode. Just to be sure that there wasn't any firewall
issue I used nc -l 3888 on one of the servers and tried connecting from the
other and that just works fine.

One problem that I see is when a Zookeeper cluster comes it only listens on
port 3888 for a short duration. After which a telnet to port 3888 returns
connection refused (nothing's running on port 3888 as confirmed by sudo
netstat).

Logs show a lot of java.net.SocketTimeoutException: connect timed out



And all nodes seem to run in standalone mode

Thanks in advance for replying

Regards
Avinash

Re: Problem with leader election

Posted by Avinash P <ma...@yahoo.com>.
Hi Flavio,

Thank you for your quick response. We're running on 3.4.5 and here's the configuration 
server.1=<hostname-1>\:2888\:3888
clientPort=2181
dataDir=/mnt/data/zookeeper
syncLimit=5
server.2=<hostname-2>\:2888\:3888
tickTime=2000
initLimit=10
dataLogDir=/mnt/data/zookeeper

Please let me know if you'd like any more information

Regards
Avinash



On Tuesday, October 15, 2013 11:07 PM, Flavio Junqueira <fp...@yahoo.com> wrote:
 
Avinash,

Which version are you using? When you say that they are falling back to
standalone mode, are you saying that they are responsive to operations? Are
they in read-only mode?

Also, would you be ok with posting the whole configuration?

Thanks,
-Flavio


-----Original Message-----
From: Avinash P [mailto:mail.avinashkp@yahoo.com] 
Sent: 16 October 2013 07:01
To: user@zookeeper.apache.org
Subject: Problem with leader election

Hello there!

We're seeing a specific problem on one of our Zookeeper clusters. When nodes
come up, they try the leader election process, fail and then fall back to
standalone mode. This is happening on all the nodes of this specific cluster
which was perfectly functional until probably yesterday.

So here are the settings:
ClientPort: 2181
ConnectPort:2888
ElectionPort:3888

We had a functioning quorum of 5 zookeeper nodes, all of which are now
running in standalone mode. Just to be sure that there wasn't any firewall
issue I used nc -l 3888 on one of the servers and tried connecting from the
other and that just works fine.

One problem that I see is when a Zookeeper cluster comes it only listens on
port 3888 for a short duration. After which a telnet to port 3888 returns
connection refused (nothing's running on port 3888 as confirmed by sudo
netstat).

Logs show a lot of java.net.SocketTimeoutException: connect timed out



And all nodes seem to run in standalone mode

Thanks in advance for replying

Regards
Avinash

RE: Problem with leader election

Posted by Flavio Junqueira <fp...@yahoo.com>.
Avinash,

Which version are you using? When you say that they are falling back to
standalone mode, are you saying that they are responsive to operations? Are
they in read-only mode?

Also, would you be ok with posting the whole configuration?

Thanks,
-Flavio

-----Original Message-----
From: Avinash P [mailto:mail.avinashkp@yahoo.com] 
Sent: 16 October 2013 07:01
To: user@zookeeper.apache.org
Subject: Problem with leader election

Hello there!

We're seeing a specific problem on one of our Zookeeper clusters. When nodes
come up, they try the leader election process, fail and then fall back to
standalone mode. This is happening on all the nodes of this specific cluster
which was perfectly functional until probably yesterday.

So here are the settings:
ClientPort: 2181
ConnectPort:2888
ElectionPort:3888

We had a functioning quorum of 5 zookeeper nodes, all of which are now
running in standalone mode. Just to be sure that there wasn't any firewall
issue I used nc -l 3888 on one of the servers and tried connecting from the
other and that just works fine.

One problem that I see is when a Zookeeper cluster comes it only listens on
port 3888 for a short duration. After which a telnet to port 3888 returns
connection refused (nothing's running on port 3888 as confirmed by sudo
netstat).

Logs show a lot of java.net.SocketTimeoutException: connect timed out



And all nodes seem to run in standalone mode

Thanks in advance for replying

Regards
Avinash