You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Vishal K (JIRA)" <ji...@apache.org> on 2010/07/19 17:51:57 UTC

[jira] Created: (ZOOKEEPER-822) Leader election taking a long time to complete

Leader election taking a long time  to complete
-----------------------------------------------

                 Key: ZOOKEEPER-822
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
             Project: Zookeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.3.0
            Reporter: Vishal K
            Priority: Blocker


Created a 3 node cluster.

1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the 
3. Repeat 

After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.

zoo.cfg is shown below:

#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000

I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915796#action_12915796 ] 

Benjamin Reed commented on ZOOKEEPER-822:
-----------------------------------------

looks good overall flavio. just a quick questions: i notice that operations on senderWorkerMap in initiateConnection are not synchronized. senderWorkerMap is concurrent, but there could be a race between the get, put, and vsw.finish if initiateConnection is called concurrently for the same sid. right?

also you need to add a blurb to the config doc for the timeout system variable, which should be "zookeeper.cnxtimeout" so that it can be set from the configuration file.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated ZOOKEEPER-822:
------------------------------------

    Hadoop Flags: [Reviewed]

+1 looks good. ready to commit.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892761#action_12892761 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

Hi Vishal,

The logs in zk_leader_election.tar.gz seem to be from different runs. node0 starts at 2010-07-22 17:33:54,166, node1 at 2010-07-22 22:21:11,979 and node2 at 2010-07-22 22:22:17,249. Are the clocks on the machine in sync?

-Ivan

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Open  (was: Patch Available)

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822-3.3.2.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896983#action_12896983 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

Actually, ignore that. Read it wrong.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Open  (was: Patch Available)

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Open  (was: Patch Available)

Going to submit patches that introduce a system property.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896061#action_12896061 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Ivan.

Were the logs of any help? I might be worth having 3 VMs and rebooting the leader instead of shutting down the interface. We have seen this on all of our dev cluster. Al tough all the dev clusters are based on same VM images. So it won't be fair to claim that the problem was seen on different set of machines. I will try with the latest trunk and let you know the result. What FLE changes do you think would have fixed this problem?

Thanks.

-Vishal




> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890199#action_12890199 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-822:
--------------------------------------------------

Hi Vishal, Do you think you can uploaded all three log files for a problematic run? We would like to put it on loggraph to visualize what's going on there. It sounds like it is somehow related to the VM reboots, I don't know why yet.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900401#action_12900401 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

Ah! my trunk is quite old then. But I don't think it is necessary to run with the latest code for debugging this issue.

I have identified one problem in WorkerSender.process(). This function calls manager.toSend() whicih calls connectOne. connectOne does a blocking connect (which takes order of minutes to return if a node is down). Thus, WorkerSender.run() will  block and not send any successive notifications to other nodes.

Let met know what you think

I tired with adding timeouts to connectOne, but I am running into similar issue somewhere else. So that didnt fix the problem





> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909556#action_12909556 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

As I mentioned earlier, this is a temporary patch until the selector based approach (non-blocking IO)  is ready.

In general, what is the concern with the current fix? There will be only one thread running at a time. The thread just makes sure that we can bound the connection time. This patch is working well for us as a temporary fix. Apart from the overhead of starting a thread I don't see anything wrong with the fix.

Again, given that this bug is a blocker for us, we certainly cannot wait until the non-blocking implementation  is done and released.

Thanks.
-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909560#action_12909560 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

I agree with Mahadev. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889907#action_12889907 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

Could you try putting the logs through loggraph (in zookeeper/src/contrib)? Perhaps a graphical view will give some insight?

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: ZOOKEEPER-822.patch_v1

Hi,

I have attached the diff of my changes. It is a pretty simple fix. My  intention was to have a fix with minimal code changes.

connectOne waits for 2 seconds for a connection to establish. Right now the value is hard-coded. I was planning to add a new property that would specify the amount of time to wait for a connect request. If you think that the proposed changes are good enough, I will make the change for using the property value and resubmit a second patch. 

Let me know.

-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: 822.tar.gz

attaching new logs.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912483#action_12912483 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

Thanks. I will take a look at the patches.

-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: test_zookeeper_2.log
                test_zookeeper_1.log

Attaching logs from two nodes that took too long to complete leader election. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822-3.3.2.patch

Attaching patch for 3.3.2.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896043#action_12896043 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

I've tried to repro this with 3 zookeepers running on the same machine, and 3 zookeepers running on virtual machine and I cannot get it to repro. I was taking out the leader by shutting down the network interface. Have you been able to repro this on another set of machines other than the ones you first observed it on?

Also, could you try this with the latest trunk? Some improvements were made around the FLE which may shed some more light.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892830#action_12892830 ] 

Patrick Hunt commented on ZOOKEEPER-822:
----------------------------------------

Ivan, do the clocks need to be in sync? Perhaps you should use the xid (cxid) instead?

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

I believe the patch I'm attaching achieves the same goal and is even simpler, but I'd like to make sure it suits your needs, Vishal.

If you agree with the modifications, I can work on a test. I was also thinking that the 2-second timeout you used before is too low and I've raised to 5 seconds. But, instead of trying to argue which value is ideal, I'd rather use a system property and use a default value of at least 5 seconds.

I also commit to redesigning QuorumCnxManager for either 3.4.0 or 4.0.0 to use a selector or some other approach we agree upon. I've been wanting to do it for a while anyway, and I actually thought there was a jira open for it... Maybe not, I can't find it right now. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894576#action_12894576 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------


I have attached new logs. I don't use ntp, but all l the nodes should be at the most a few seconds apart. 

I have marked start and end of the faulty election. look at  zookeeper-192.168.10.3-log and search for "vishal", 

Note - it is super easy to reproduce the bug. Create a 3 node cluster and reboot the leader (or shutdown the network interface). You may need to repeat the test several times. 

If you do a clean shutdown of the leader (zkServer.sh stop), then you won't see this bug. I feel that there is something releated to TCP timeout/ session management of failed node that is causing this problem.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896980#action_12896980 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

Could this be related?

https://issues.apache.org/jira/browse/ZOOKEEPER-785

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905836#action_12905836 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

{quote}
1. Blocking connects and accepts:
You are right, when the node is down TCP timeouts rule.

a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective pe!
 er.
{quote}

As I commented before, it might be ok to make it asynchronous, especially if we have a way of checking that there is an attempt to establish a connection in progress. 
I'm also still intrigued about why this is a problem for you. I haven't seen any of this being a problem before, which of course doesn't mean we shouldn't fix it. It would be nice to understand what's special about your setup or if others have seen similar problems and I missed the reports.

{quote}
b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify.
{quote}

If I remember correctly, we currently synchronize connectOne and make all connection establishments through connectOne so that we make sure that we do one at a time. My understanding is that this should reduce the number of rounds of attempts to establish connections, perhaps at the cost of a longer delay in some runs. 

{quote}
2. Buggy senderWorkerMap handling:
The code that manages senderWorkerMap is very buggy. It is causing multiple election rounds. While debugging I found that sometimes after FLE a node will have its sendWorkerMap empty even if it has SenderWorker and RecvWorker threads for each peer.
{quote}

I don't think that having multiple rounds is bad; in fact, I think it is unavoidable using reasonable timeout values. The second part, however, sounds like a problem we should fix. 

{quote}
a) The receiveConnection() method calls the finish() method, which removes an entry from the map. Additionally, the thread itself calls finish() which could remove the newly added entry from the map. In short, receiveConnection is causing the exact condition that you mentioned above.
{quote}

I thought that we were increasing the intervals between notifications, and if so I believe the case you mention above should not happen more than a few times. Now, to fix it, it sounds like we need to check that the finish call is removing the correct object in sendWorkerMap. That is, obj.finish() should remove obj and do nothing if the SendWorker object in sendWorkerMap is a different one. What do you think?

{quote}
b) Apart from the bug in finish(), receiveConnection is making an entry in senderWorkerMap at the wrong place. Here's the buggy code:
SendWorker vsw = senderWorkerMap.get(sid);
senderWorkerMap.put(sid, sw);
if(vsw != null)
vsw.finish();
It makes an entry for the new thread and then calls finish, which causes the new thread to be removed from the Map. The old thread will also get terminated since finish() will interrupt the thread.
{quote}

See my comment above. Perhaps I should wait to see your proposed modifications, but I wonder if works to check that we are removing the correct SendWorker object.

{quote}
3. Race condition in receiveConnection and initiateConnection:

In theory, two peers can keep disconnecting each other's connection.

Example:
T0: Peer 0 initiates a connection (request 1)
T1: Peer 1 receives connection from peer 0
T2: Peer 1 calls receiveConnection()
T2: Peer 0 closes connection to Peer 1 because its ID is lower.
T3: Peer 0 re-initiates connection to Peer 1 from manger.toSend() (request 2)
T3: Peer 1 terminates older connection to peer 0
T4: Peer 1 calls connectOne() which starts new sendWorker threads for peer 0
T5: Peer 1 kills connection created in T3 because it receives another (request 2) connect request from 0

The problem here is that while Peer 0 is accepting a connection from Peer 1 it can also be initiating a connection to Peer 1. So if they hit the right frequencies they could sit in a connect/disconnect loop and cause multiple rounds of leader election.

I think the cause here is again blocking connects()/accepts(). A peer starts to take action (to kill existing threads and start new threads) as soon as a connection is established at the TCP level. That is, it does not give us any control to synchronized connect and accepts. We could use non-blocking connects and accepts. This will allow us to a) tell a thread to not initiate a connection because the listener is about to accept a connection from the remote peer (use isAcceptable() and isConnectable()methods of SelectionKey) and b) prevent a thread from initiating multiple connect request to the same peer. It will simplify synchronization.
{quote}

Even though it is true that you could have an infinite run where the two processes bump into each other forever, we increase the interval between new batches of notifications, so it should eventually stop, and in my experience it typically doesn't go beyond two rounds. However, I agree that if there were a way of verifying that there is a connection establishment in progress we could stop earlier. It sounds like a good idea to give it a try.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902988#action_12902988 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

The fix for problem 1 and 2 above eliminates the bug. I will have a patch out soon.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900165#action_12900165 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

Vishal, You don't seem to be using trunk code. The current trunk code would report notifications using the following format when report level info is enabled:

{noformat}
LOG.info("Notification: " + n.leader + " (n.leader), " + n.zxid +
                " (n.zxid), " + n.epoch + " (n.round), " + n.state +
                " (n.state), " + n.sid + " (n.sid), " + self.getPeerState() +
                " (my state)");
{noformat}

And I'm seeing the following in the excerpt above:

{noformat}
Notification: 0, 34359738368, 4, 0, LOOKING, LOOKING, 0
{noformat}

Also, it would be great if we could use loggraph to visualize what is going on in your situation. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913307#action_12913307 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

+1. Looks good. I remember looking at the socket.connect() method, but I don't remember why I ruled it out in the favor of thread.
Minor point - missing space before "error" in LOG.warn("Connection broken: for id " + sid + "my id = " + self.getId() + "error..).

Thank you.

-Vishal



> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896063#action_12896063 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

They logs were of some help, but I don't understand what's happening. I looks like multiple nodes are claiming leadership at the same time, but that can't be right.

The FLE changes won't fix it, but they do log more information, so they may make it easier to see what is happening.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905528#action_12905528 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

I was planning to send out a mail explaining the problems in the FLE implementation that I have found so far. For now, I will put the info here. We can create new JIRAs if needed. I am waiting to hear back from our legal department to resolve copyright issues so that I can share my fixes as well.

1. Blocking connects and accepts:
You are right, when the node is down TCP timeouts rule.

a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective pe!
 er.

b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify.

2. Buggy senderWorkerMap handling:
The code that manages senderWorkerMap is very buggy. It is causing multiple election rounds. While debugging I found that sometimes after FLE a node will have its sendWorkerMap empty even if it has SenderWorker and RecvWorker threads for each peer.

a) The receiveConnection() method calls the finish() method, which removes an entry from the map. Additionally, the thread itself calls finish() which could remove the newly added entry from the map. In short, receiveConnection is causing the exact condition that you mentioned above.

b) Apart from the bug in finish(), receiveConnection is making an entry in senderWorkerMap at the wrong place. Here's the buggy code:
            SendWorker vsw = senderWorkerMap.get(sid);
            senderWorkerMap.put(sid, sw);
            if(vsw != null)
                vsw.finish();
It makes an entry for the new thread and then calls finish, which causes the new thread to be removed from the Map. The old thread will also get terminated since finish() will interrupt the thread.

3. Race condition in receiveConnection and initiateConnection:

*In theory*, two peers can keep disconnecting each other's connection.

Example:
T0: Peer 0 initiates a connection (request 1)
                                                                                               T1: Peer 1 receives connection from peer 0
                                                                                               T2: Peer 1 calls receiveConnection()
T2: Peer 0 closes connection to Peer 1 because its ID is lower.
T3: Peer 0 re-initiates connection to Peer 1 from manger.toSend() (request 2)
T3: Peer 1 terminates older connection to peer 0
T4: Peer 1 calls connectOne() which starts new sendWorker threads for peer 0
T5: Peer 1 kills connection created in T3 because it receives another (request 2) connect request from 0

The problem here is that while Peer 0 is accepting a connection from Peer 1 it can also be initiating a connection to Peer 1. So if they hit the right frequencies they could sit in a connect/disconnect loop and cause multiple rounds of leader election.

I think the cause here is again blocking connects()/accepts(). A peer starts to take action (to kill existing threads and start new threads) as soon as a connection is established at the *TCP level*. That is, it does not give us any control to synchronized connect and accepts. We could use non-blocking connects and accepts. This will allow us to a) tell a thread to not initiate a connection because the listener is about to accept a connection from the remote peer (use isAcceptable() and isConnectable()methods of SelectionKey) and b) prevent a thread from initiating multiple connect request to the same peer. It will simplify synchronization.

Any thoughts?

-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-822:
------------------------------------

         Assignee: Vishal K
    Fix Version/s: 3.3.2
                   3.4.0

Marking this for 3.3.2, to see if we want this included in 3.3.2.



> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899983#action_12899983 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

While going through the code yesterday,  I found two potential problems that I though might be worth reporting in the context of this bug.

1. In FastLeaderElection.java
    /**
     * Check if all queues are empty, indicating that all messages have been delivered.
     */
    boolean haveDelivered() {
        for (ArrayBlockingQueue<ByteBuffer> queue : queueSendMap.values()) {
            LOG.debug("Queue size: " + queue.size());
            if (queue.size() == 0)
                return true;
        }

        return false;
    }

the haveDelivered()  function returns true without checking if rest of the queus are empty.

2. QuorumCnxManager.connectAll() function connects to one peer at a time and it uses a blocking connect (SocketChannle.open). I added a timeout to the SocketChannel.open and that did not fix the problem. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822-3.3.2.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-822:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to trunk/33 - thanks Vishal and everyone who pushed this through!

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899156#action_12899156 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Ivan,

Can you describe me your setup?

My setup info:
- 3 ESX boxes
- 1 SLES 11 VMs on each
- Cluster of 3 nodes

I hit this problem consistently after rebooting the leader.

Thanks.
-Vishal


> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: zookeeper-3.4.0.tar.gz

Hi,

To rule out any setup specific issues (VM/ESX/ect), I tried to reproduce this problem with 3 physical machines (no VMs). I was able to reproduce the same problem after 15 reboots or so. It took a little more number of reboots on the physical setup. I used the latest code this time. I  have attached the logs. 

I am suspecting 3 problems:
1. blocking connect in WorkerSender.process as described in my earlier comments.
2. SendWorker.run() calls finish at the end. This could result in finish() getting called twice (e.g., finish called from receiveConnection), thus, causing  senderWorkerMap.remove(sid) called twice and removing an entry that should be removed.
3. Race condition that causes one of the peers to disconnect other peer's connection (receiveConnection/initiateConnection issue). I am still verifying/investigating the this.

In logs of server id 0 and 1, search for ***REBOOTING LEADER*** from the bottom. This line marks the start of faulty FLE.

-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822-3.3.2.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900108#action_12900108 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

I am suspecting that one of the node (10.17.119.101) is not sending the notification to the other node. sendNotifications() is called to send notification to all peers. This functions enteres the notification in sendqueue. However, either the entry was not put in the queue (sendqueue.offer failed) or the thread that polls sendqueue did not wake up. I am not sure what the cause is yet.

I had added extra debug messages. Three messages are of main interest:

1. in sendNotifications():  Print "IN FLE sending notification to server id = 1" for each server. Also print  "proposedLeader, proposedZxid, logicalclock"
2. In FastLeaderElection.lookForLeader() print "Updating proposa" before calling upgradeProposal if (totalOrderPredicate(n.leader, n.zxid, proposedZxid) is true
3. in WorkerSender.process(), log -  LOG.info("WorkSender.process() QUEUEING m.state= " + m.state + " m.leader=" + m.leader + " m.sid=" + m.sid);

Suppporting log entries from 10.17.119.101-zookeeper.log.  I  have added description inline..
------------------
2010-08-18 14:53:56,451 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@475] - IN FLE sending notification to server id = 1
2010-08-18 14:53:56,451 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@476] - proposedLeader, proposedZxid, logicalclock 0343597383684
2010-08-18 14:53:56,452 - INFO  [WorkerSender Thread:FastLeaderElection$Messenger$WorkerSender@352] - WorkSender.process() QUEUEING m.state= LOOKING m.leader=0 m.sid=1
2010-08-18 14:53:56,452 - DEBUG [WorkerSender Thread:QuorumCnxManager@347] - Opening channel to server 1
2010-08-18 14:53:56,453 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@475] - IN FLE sending notification to server id = 2
2010-08-18 14:53:56,453 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@476] - proposedLeader, proposedZxid, logicalclock 0343597383684
2010-08-18 14:53:56,453 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@690] - Notification: 1, 34359738368, 4, 0, LOOKING, LOOKING, 1
2010-08-18 14:53:56,454 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@496] - id: 1, proposed id: 0, zxid: 34359738368, proposed zxid: 34359738368
2010-08-18 14:53:56,454 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@710] - Updating proposal
2010-08-18 14:53:56,454 - INFO  [WorkerSender Thread:QuorumCnxManager@162] - Have smaller server identifier, so dropping the connection: (1, 0)
2010-08-18 14:53:56,455 - INFO  [WorkerSender Thread:FastLeaderElection$Messenger$WorkerSender@352] - WorkSender.process() QUEUEING m.state= LOOKING m.leader=0 m.sid=2
2010-08-18 14:53:56,458 - DEBUG [WorkerSender Thread:QuorumCnxManager@347] - Opening channel to server 2
2010-08-18 14:53:56,458 - WARN  [Thread-19:QuorumCnxManager$RecvWorker@659] - Connection broken:
java.io.IOException: Channel eof
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:631)
2010-08-18 14:53:56,459 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@475] - IN FLE sending notification to server id = 0

2010-08-18 14:53:56,460 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@476] - proposedLeader, proposedZxid, logicalclock 1343597383684  

***** The above line shows that this node (server 0) is going to vote for 1. see - proposedLeader, proposedZxid, logicalclock 1 34359738368 4  Forgot to add spaces in the message  :-) ****


2010-08-18 14:53:56,460 - DEBUG [Thread-1:QuorumCnxManager$Listener@446] - Connection request /10.17.119.102:41597
2010-08-18 14:53:56,461 - DEBUG [Thread-1:QuorumCnxManager$Listener@449] - Connection request: 0
2010-08-18 14:53:56,461 - DEBUG [Thread-1:QuorumCnxManager$SendWorker@505] - Address of remote peer: 1
2010-08-18 14:53:56,461 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@475] - IN FLE sending notification to server id = 1
2010-08-18 14:53:56,462 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@476] - proposedLeader, proposedZxid, logicalclock 1343597383684 

***** Above, server 0 queued a notification to be sent to server 1. The notfication is saying that it accepts 1 as the leader. But the notification never got sent. process() was not called at all from WorkerSender. Its almost as if the notification was never entered in sendqueue (in sendNotifications). *****

2010-08-18 14:53:56,462 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@475] - IN FLE sending notification to server id = 2
2010-08-18 14:53:56,462 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@476] - proposedLeader, proposedZxid, logicalclock 1343597383684
2010-08-18 14:53:56,463 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@716] - Adding vote: From = 1, Proposed leader = 1, Porposed zxid = 34359738368, Proposed epoch = 4
2010-08-18 14:53:56,463 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@690] - Notification: 0, 34359738368, 4, 0, LOOKING, LOOKING, 0
2010-08-18 14:53:56,463 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@496] - id: 0, proposed id: 1, zxid: 34359738368, proposed zxid: 34359738368
2010-08-18 14:53:56,463 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@716] - Adding vote: From = 0, Proposed leader = 0, Porposed zxid = 34359738368, Proposed epoch = 4


> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909434#action_12909434 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

Hi Vishal, I have taken a look at your patch. As I said before, it sounds good to me to make SocketChannel non-blocking, but I don't like very much the approach of creating one thread per connection attempt. Instead, I was thinking that we should try to use a selector. What do you think?

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: rhel.tar.gz

Hi Ivan,

Per your suggetion, I have reproduced the same problem on Red Hat Enterprise Linux Server release 5.5.
(3 RHEL VMs on 3 ESX).

I used the code from trunk (may be a week old but that shouldn't matter). 
The leader election with the latest code is taking way more time to finish (2 mins 43 seconds).

The reproduction procedure was the same :
- reboot -n the leader.
- let the rebooted node come up
- start zookeeper on the rebooted node
- repeat until we run into the issue

Look for the last round of election (marked ****** Rebooting LEADER) in 10.17.119.101-zookeeper.log and 10.17.119.102-zookeeper.log.


Thanks.
-Vishal



> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900898#action_12900898 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Correction:
2. SendWorker.run() calls finish at the end. This could result in finish() getting called twice (e.g., finish called from receiveConnection), thus, causing senderWorkerMap.remove(sid) called twice and removing an entry that should *not* be removed.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Patch Available  (was: Open)

Thanks for the comments, Ben. I have modified zookeeperAdmin and added the "zookeeper." prefix to the code.

Regarding your question, initiateConnection is called from two methods: testInitiateConnection (used only in tests) and connectOne. connectOne is synchronized. Do you still see an issue?

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889902#action_12889902 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

I would like that add that the problem is highly reproducible.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906872#action_12906872 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

I'll have a look at it, Vishal. Thanks for posting it.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904684#action_12904684 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

Hi VIshal, Good catches:

1- It sounds right that blocking the connection establishment might increase the time to election unnecessarily when the other party is not up. Here is my interpretation. If the machine is up but the the zk server is not running, then we simply get a connection failure and move on. The same doesn't happen when the the machine is down, since we need to wait for the connection establishment to time out;
2- It sounds right that a connection can be dropped erroneously due to a race, but I don't see in which case it can cause the election time to increase substantially, unless the race is triggered multiple times in a row. A server will try to connect upon every new notification, and a server only calls SendWorker.finish() in receiveNotification if it has a higher identifier. In this case, it creates a new connection immediately after, so it would need a previous connection being dropped right before to have the case you're describing;
3- Servers with higher identifiers decline connection requests from servers with lower identifiers; it is part of the protocol. Is this what you're referring to?


> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906854#action_12906854 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

> I think we need some time to converge on problems and fixes. 

I don't think it would take a long time to converge. I think the patch that I attached is quite simple. After adding a new property for timeout we should be good to go.

> My understanding is that we want to have 3.3.2 out soon, and my feeling is that this is not a blocker for 3.3.2 given Vishal's description and our experience with the system so far, but it would be good to hear from Vishal.

>From our earlier email exchanges I have a feeling that in most cases FLE was tested by restarting the ZooKeeper service (and not by rebooting/shutting down the host). I am a bit concerned that enough time may not have been spent in testing/reproducing this problem. In my opinion, this fix should go in 3.3.2. I know for sure that we won't be able to use the next release as is without this fix.

Thanks.
-Vishal

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Patch Available  (was: Open)

Thanks for reviewing it, Vishal. I have fixed the LOG.warn you pointed out and uploaded new patch files.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900111#action_12900111 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

at line 852 in 10.17.119.101-zookeeper.log WorkSender finally finds something in sendqueue and starts sending the notification to server 1.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-822:
-------------------------------

    Attachment: zk_leader_election.tar.gz

Hi Flavio,

I have attached the logs. Hope this helps. Do you have more info on loggraph (doc, etc)?

Thanks.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918878#action_12918878 ] 

Hudson commented on ZOOKEEPER-822:
----------------------------------

Integrated in ZooKeeper-trunk #959 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/959/])
    ZOOKEEPER-822. Leader election taking a long time to complete


> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900010#action_12900010 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

Hi Vishal, Thanks for reporting. Here are some quick comments:

Issue 1: I think that just the javadoc message is incorrect. We really just want to check that some process has received notifications.

Issue2: The connection will eventually timeout if not established, so setting a different value should not make a difference. The point about blocking connect is a good one. I think it is worthwhile creating a jira for it.  

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Patch Available  (was: Open)

I have added a system property called "cnxtimeout" to change the timeout value in QuorumCnxManager. Tests pass for me.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

I'm adding a test to the patch. It tries to send a message to an address for which a connection request receives no response, so it has to timeout. The test then checks that the amount of time elapsed is less than 6s (the timeout value is hardcoded 5s). Raising the timeout from 5s to say 7s makes the test fail.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891582#action_12891582 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-822:
--------------------------------------------------

Thanks for the logs, Vishal. The jira discussing loggraph is ZOOKEEPER-773.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909558#action_12909558 ] 

Mahadev konar commented on ZOOKEEPER-822:
-----------------------------------------

visha, flavio,
  If there is just one thread running at one point in time, then its ok. Also, I am really worried about the code structure in LeaderElection.java. Its ok to have a temporary fix, but it would be great to see some commitment from someone on doing it right in 3.4. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909592#action_12909592 ] 

Mahadev konar commented on ZOOKEEPER-822:
-----------------------------------------

vishal,
 I was expecting some commitment from you for making it use a selector :).

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889906#action_12889906 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-822:
--------------------------------------------------

Vishal, I can't reproduce your problem. I just tried twice to kill the leader and rejoin it 20 times each, and I can't see the problem you're mentioning.  I wonder if there is anything special about your setup. I also can see in your logs lots of exceptions related to connections, and as a first cut, it sounds like this is preventing the severs from exchanging notifications, and therefore the delay. 

Two minor comments: your log file for server 2 does not contain "START HERE" and each file duplicates every message.



> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Status: Patch Available  (was: Open)

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890052#action_12890052 ] 

Vishal K commented on ZOOKEEPER-822:
------------------------------------

Hi Flavio,

I have Zookeeper servers running in a VM. To kill ZK server, I power off a VM. On the other hand, I tried several times killing ZK process and restarting it and I did not see any issues.
So there is something about the reboot that is causing this problem (TCP session not getting cleaned-up?).

I don't see many connection exceptions in the log.

Once the leader election starts  we start seeing "Notification time out" messages.

However, before this we do see that the connection was established (show below):

2010-07-19 14:40:52,562 - DEBUG [WorkerSender Thread:QuorumCnxManager@366] - There is a connection already for server 0
2010-07-19 14:40:52,563 - DEBUG [WorkerSender Thread:QuorumCnxManager@346] - Opening channel to server 2

Do you still think this is a communication problem?

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893121#action_12893121 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--------------------------------------

currently the timestamp is used as the zxid isn't always available in the message logs. But yes, zxid would be more desirable. Perhaps I can extract that from the context some other way. 



> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906346#action_12906346 ] 

Flavio Junqueira commented on ZOOKEEPER-822:
--------------------------------------------

I think we need some time to converge on problems and fixes. My understanding is that we want to have 3.3.2 out soon, and my feeling is that this is not a blocker for 3.3.2 given Vishal's description and our experience with the system so far, but it would be good to hear from Vishal.

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.