You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Samuel Rash <ra...@fb.com> on 2011/02/17 22:08:49 UTC

ephemeral node problem

Hi,

We are running zookeeper 3.3.2 and have seen what appears to be a problem with ephemeral nodes.  We create about 2000 persistent nodes (leaves) in a hierarchy.  Under each of these, we run a leader
election with ephemeral nodes (~40).  This results in about 80,000 total ephemeral nodes.  During restart of our system, the leader elections can churn a bit as hosts remove themselves, electing new leaders, which then themselves may withdraw from the election.  In one such restart, we saw an election get 'stuck'.  Upon investigating, one node had it's session expired (indicated in the zk logs), but one of its ephemeral nodes was still left.  We took down the processes holdilng ephemeral nodes and this node remained.

Are there any known bugs in zookeeper that might result in this?  it does not appear to happen under our normal laod.

thx,
-sr



Sam Rash
rash@fb.com

Re: ephemeral node problem

Posted by Samuel Rash <ra...@fb.com>.
yes!

this is what we saw in our zk logs (CancelledKeyException)

Sam Rash
rash@fb.com





On 2/17/11 1:29 PM, "Mahadev Konar" <ma...@apache.org> wrote:

>You probably ran into
>
>https://issues.apache.org/jira/browse/ZOOKEEPER-919
>?
>
>If yes, 3.3.3 has the fix! It is due to be released next week.
>
>thanks
>mahadev
>On Thu, Feb 17, 2011 at 1:08 PM, Samuel Rash <ra...@fb.com> wrote:
>> Hi,
>>
>> We are running zookeeper 3.3.2 and have seen what appears to be a
>>problem with ephemeral nodes.  We create about 2000 persistent nodes
>>(leaves) in a hierarchy.  Under each of these, we run a leader
>> election with ephemeral nodes (~40).  This results in about 80,000
>>total ephemeral nodes.  During restart of our system, the leader
>>elections can churn a bit as hosts remove themselves, electing new
>>leaders, which then themselves may withdraw from the election.  In one
>>such restart, we saw an election get 'stuck'.  Upon investigating, one
>>node had it's session expired (indicated in the zk logs), but one of its
>>ephemeral nodes was still left.  We took down the processes holdilng
>>ephemeral nodes and this node remained.
>>
>> Are there any known bugs in zookeeper that might result in this?  it
>>does not appear to happen under our normal laod.
>>
>> thx,
>> -sr
>>
>>
>>
>> Sam Rash
>> rash@fb.com
>>


Re: ephemeral node problem

Posted by Mahadev Konar <ma...@apache.org>.
You probably ran into

https://issues.apache.org/jira/browse/ZOOKEEPER-919
?

If yes, 3.3.3 has the fix! It is due to be released next week.

thanks
mahadev
On Thu, Feb 17, 2011 at 1:08 PM, Samuel Rash <ra...@fb.com> wrote:
> Hi,
>
> We are running zookeeper 3.3.2 and have seen what appears to be a problem with ephemeral nodes.  We create about 2000 persistent nodes (leaves) in a hierarchy.  Under each of these, we run a leader
> election with ephemeral nodes (~40).  This results in about 80,000 total ephemeral nodes.  During restart of our system, the leader elections can churn a bit as hosts remove themselves, electing new leaders, which then themselves may withdraw from the election.  In one such restart, we saw an election get 'stuck'.  Upon investigating, one node had it's session expired (indicated in the zk logs), but one of its ephemeral nodes was still left.  We took down the processes holdilng ephemeral nodes and this node remained.
>
> Are there any known bugs in zookeeper that might result in this?  it does not appear to happen under our normal laod.
>
> thx,
> -sr
>
>
>
> Sam Rash
> rash@fb.com
>