You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Cameron McKenzie <mc...@gmail.com> on 2014/05/08 12:15:39 UTC

Ephemeral node bound to a session that times out while ZK has no quorum

Guys,
I've noticed a weird problem with ephemeral nodes not being cleaned up if
the session they are tied to times out while ZooKeeper does not have a
quorum. The situation is basically as follows:

3 node cluster
-Client connects to cluster and creates an ephemeral node
-Two nodes die, so quorum is lost
-Some time passes (longer than the session timeout negotiated for the
client that created the ephemeral node)
-One (or both) of the dead nodes come back and a quorum is reformed.
-The ephemeral node tied to the session which should have timed out still
exists

Re: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by Cameron McKenzie <mc...@gmail.com>.
Thanks Flavio,
This probably explains the situation, but I will have to check the logs
again to be sure. It seemed like the ephemeral node didn't get cleaned up
for an extended period of time even though the client had established a new
connection. Could possibly be some weirdness where the old session was
still alive because it hadn't been closed down properly, but this seems
unlikely.

Anyway, thanks for the link.
cheers


On Fri, May 16, 2014 at 8:13 PM, FPJ <fp...@yahoo.com> wrote:

> Hi Cameron,
>
> The last point of the FAQ might clarify why the ephemerals are not getting
> deleted when the cluster is coming back up:
>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ
>
> -Flavio
>
> > -----Original Message-----
> > From: Cameron McKenzie [mailto:mckenzie.cam@gmail.com]
> > Sent: 08 May 2014 11:42
> > To: zookeeper-user@hadoop.apache.org
> > Subject: Re: Ephemeral node bound to a session that times out while ZK
> has
> > no quorum
> >
> > After a few more trials, unfortunately it seems completely random as to
> how
> > long the ephemeral nodes are sticking around. Sometime's it's minutes,
> > sometime's they're cleaned up in a matter of seconds after startup...
> >
> >
> > On Thu, May 8, 2014 at 8:31 PM, Cameron McKenzie
> > <mc...@gmail.com>wrote:
> >
> > > Sorry, bashed send prematurely!
> > >
> > > Guys,
> > > I've noticed a weird problem with ephemeral nodes not being cleaned up
> > > if the session they are tied to times out while ZooKeeper does not
> > > have a quorum. The situation is basically as follows:
> > >
> > > 3 node cluster
> > > -Client connects to cluster and creates an ephemeral node -Two nodes
> > > die, so quorum is lost -Some time passes (longer than the session
> > > timeout negotiated for the client that created the ephemeral node)
> > > -One (or both) of the dead nodes come back and a quorum is reformed.
> > > -The ephemeral node tied to the session which should have timed out
> > > still exists and never seems to get cleaned up.
> > > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems
> > > to think that the session is still active and associated with the
> > > ephemeral node in question.
> > > -It seems to stay in this state for some extended period of time (20+
> > > minutes). Interestingly, when I happened to fire up zkCli.sh I could
> > > see that the node was still there, but after I exited, the node seemed
> > > to disappear shortly afterwards. So, I wonder if the session
> > > established by zkCli.sh ending somehow triggered the cleanup of this
> rogue
> > ephemeral node?
> > >
> > > Has anyone experience this issue before? I understand that it's a bit
> > > of an edge case, but I'm running across it quite frequently when
> > > testing changing the size of ZK cluster.
> > >
> > > I've thought of a few work arounds for the issue, but I'd like to know
> > > if it's a known issue.
> > >
> > > Any help appreciated!
> > > cheers
> > >
> > >
> > >
> > > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie
> > <mc...@gmail.com>wrote:
> > >
> > >> Guys,
> > >> I've noticed a weird problem with ephemeral nodes not being cleaned
> > >> up if the session they are tied to times out while ZooKeeper does not
> > >> have a quorum. The situation is basically as follows:
> > >>
> > >> 3 node cluster
> > >> -Client connects to cluster and creates an ephemeral node -Two nodes
> > >> die, so quorum is lost -Some time passes (longer than the session
> > >> timeout negotiated for the client that created the ephemeral node)
> > >> -One (or both) of the dead nodes come back and a quorum is reformed.
> > >> -The ephemeral node tied to the session which should have timed out
> > >> still exists
> > >>
> > >>
> > >
>
>

RE: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by FPJ <fp...@yahoo.com>.
Hi Cameron,

The last point of the FAQ might clarify why the ephemerals are not getting deleted when the cluster is coming back up: 

https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ

-Flavio

> -----Original Message-----
> From: Cameron McKenzie [mailto:mckenzie.cam@gmail.com]
> Sent: 08 May 2014 11:42
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Ephemeral node bound to a session that times out while ZK has
> no quorum
> 
> After a few more trials, unfortunately it seems completely random as to how
> long the ephemeral nodes are sticking around. Sometime's it's minutes,
> sometime's they're cleaned up in a matter of seconds after startup...
> 
> 
> On Thu, May 8, 2014 at 8:31 PM, Cameron McKenzie
> <mc...@gmail.com>wrote:
> 
> > Sorry, bashed send prematurely!
> >
> > Guys,
> > I've noticed a weird problem with ephemeral nodes not being cleaned up
> > if the session they are tied to times out while ZooKeeper does not
> > have a quorum. The situation is basically as follows:
> >
> > 3 node cluster
> > -Client connects to cluster and creates an ephemeral node -Two nodes
> > die, so quorum is lost -Some time passes (longer than the session
> > timeout negotiated for the client that created the ephemeral node)
> > -One (or both) of the dead nodes come back and a quorum is reformed.
> > -The ephemeral node tied to the session which should have timed out
> > still exists and never seems to get cleaned up.
> > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems
> > to think that the session is still active and associated with the
> > ephemeral node in question.
> > -It seems to stay in this state for some extended period of time (20+
> > minutes). Interestingly, when I happened to fire up zkCli.sh I could
> > see that the node was still there, but after I exited, the node seemed
> > to disappear shortly afterwards. So, I wonder if the session
> > established by zkCli.sh ending somehow triggered the cleanup of this rogue
> ephemeral node?
> >
> > Has anyone experience this issue before? I understand that it's a bit
> > of an edge case, but I'm running across it quite frequently when
> > testing changing the size of ZK cluster.
> >
> > I've thought of a few work arounds for the issue, but I'd like to know
> > if it's a known issue.
> >
> > Any help appreciated!
> > cheers
> >
> >
> >
> > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie
> <mc...@gmail.com>wrote:
> >
> >> Guys,
> >> I've noticed a weird problem with ephemeral nodes not being cleaned
> >> up if the session they are tied to times out while ZooKeeper does not
> >> have a quorum. The situation is basically as follows:
> >>
> >> 3 node cluster
> >> -Client connects to cluster and creates an ephemeral node -Two nodes
> >> die, so quorum is lost -Some time passes (longer than the session
> >> timeout negotiated for the client that created the ephemeral node)
> >> -One (or both) of the dead nodes come back and a quorum is reformed.
> >> -The ephemeral node tied to the session which should have timed out
> >> still exists
> >>
> >>
> >


Re: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by Cameron McKenzie <mc...@gmail.com>.
After a few more trials, unfortunately it seems completely random as to how
long the ephemeral nodes are sticking around. Sometime's it's minutes,
sometime's they're cleaned up in a matter of seconds after startup...


On Thu, May 8, 2014 at 8:31 PM, Cameron McKenzie <mc...@gmail.com>wrote:

> Sorry, bashed send prematurely!
>
> Guys,
> I've noticed a weird problem with ephemeral nodes not being cleaned up if
> the session they are tied to times out while ZooKeeper does not have a
> quorum. The situation is basically as follows:
>
> 3 node cluster
> -Client connects to cluster and creates an ephemeral node
> -Two nodes die, so quorum is lost
> -Some time passes (longer than the session timeout negotiated for the
> client that created the ephemeral node)
> -One (or both) of the dead nodes come back and a quorum is reformed.
> -The ephemeral node tied to the session which should have timed out still
> exists and never seems to get cleaned up.
> -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
> think that the session is still active and associated with the ephemeral
> node in question.
> -It seems to stay in this state for some extended period of time (20+
> minutes). Interestingly, when I happened to fire up zkCli.sh I could see
> that the node was still there, but after I exited, the node seemed to
> disappear shortly afterwards. So, I wonder if the session established by
> zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral node?
>
> Has anyone experience this issue before? I understand that it's a bit of
> an edge case, but I'm running across it quite frequently when testing
> changing the size of ZK cluster.
>
> I've thought of a few work arounds for the issue, but I'd like to know if
> it's a known issue.
>
> Any help appreciated!
> cheers
>
>
>
> On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mc...@gmail.com>wrote:
>
>> Guys,
>> I've noticed a weird problem with ephemeral nodes not being cleaned up if
>> the session they are tied to times out while ZooKeeper does not have a
>> quorum. The situation is basically as follows:
>>
>> 3 node cluster
>> -Client connects to cluster and creates an ephemeral node
>> -Two nodes die, so quorum is lost
>> -Some time passes (longer than the session timeout negotiated for the
>> client that created the ephemeral node)
>> -One (or both) of the dead nodes come back and a quorum is reformed.
>> -The ephemeral node tied to the session which should have timed out still
>> exists
>>
>>
>

Re: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by Cameron McKenzie <mc...@gmail.com>.
hey Michi,
I'll have to double check the logs to see if the client got a session
expired event, but I would presume so because the ephemeral nodes lying
around had a different session ID. I guess it's a possibility that the old
connection stayed open, and a new one was also created, but I don't believe
this to be the case.
cheers


On Thu, May 15, 2014 at 12:41 PM, Michi Mutsuzaki <mi...@cs.stanford.edu>wrote:

> Hi Cameron,
>
> Did the client get the session expired event? Sessions don't expire
> during quorum loss, and I'm guessing the session got revalidated when
> the cluster reformed a quorum.
>
>
> On Thu, May 8, 2014 at 3:31 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> > Sorry, bashed send prematurely!
> >
> > Guys,
> > I've noticed a weird problem with ephemeral nodes not being cleaned up if
> > the session they are tied to times out while ZooKeeper does not have a
> > quorum. The situation is basically as follows:
> >
> > 3 node cluster
> > -Client connects to cluster and creates an ephemeral node
> > -Two nodes die, so quorum is lost
> > -Some time passes (longer than the session timeout negotiated for the
> > client that created the ephemeral node)
> > -One (or both) of the dead nodes come back and a quorum is reformed.
> > -The ephemeral node tied to the session which should have timed out still
> > exists and never seems to get cleaned up.
> > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
> > think that the session is still active and associated with the ephemeral
> > node in question.
> > -It seems to stay in this state for some extended period of time (20+
> > minutes). Interestingly, when I happened to fire up zkCli.sh I could see
> > that the node was still there, but after I exited, the node seemed to
> > disappear shortly afterwards. So, I wonder if the session established by
> > zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral
> node?
> >
> > Has anyone experience this issue before? I understand that it's a bit of
> an
> > edge case, but I'm running across it quite frequently when testing
> changing
> > the size of ZK cluster.
> >
> > I've thought of a few work arounds for the issue, but I'd like to know if
> > it's a known issue.
> >
> > Any help appreciated!
> > cheers
> >
> >
> >
> > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mckenzie.cam@gmail.com
> >wrote:
> >
> >> Guys,
> >> I've noticed a weird problem with ephemeral nodes not being cleaned up
> if
> >> the session they are tied to times out while ZooKeeper does not have a
> >> quorum. The situation is basically as follows:
> >>
> >> 3 node cluster
> >> -Client connects to cluster and creates an ephemeral node
> >> -Two nodes die, so quorum is lost
> >> -Some time passes (longer than the session timeout negotiated for the
> >> client that created the ephemeral node)
> >> -One (or both) of the dead nodes come back and a quorum is reformed.
> >> -The ephemeral node tied to the session which should have timed out
> still
> >> exists
> >>
> >>
>

Re: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by Michi Mutsuzaki <mi...@cs.stanford.edu>.
Hi Cameron,

Did the client get the session expired event? Sessions don't expire
during quorum loss, and I'm guessing the session got revalidated when
the cluster reformed a quorum.


On Thu, May 8, 2014 at 3:31 AM, Cameron McKenzie <mc...@gmail.com> wrote:
> Sorry, bashed send prematurely!
>
> Guys,
> I've noticed a weird problem with ephemeral nodes not being cleaned up if
> the session they are tied to times out while ZooKeeper does not have a
> quorum. The situation is basically as follows:
>
> 3 node cluster
> -Client connects to cluster and creates an ephemeral node
> -Two nodes die, so quorum is lost
> -Some time passes (longer than the session timeout negotiated for the
> client that created the ephemeral node)
> -One (or both) of the dead nodes come back and a quorum is reformed.
> -The ephemeral node tied to the session which should have timed out still
> exists and never seems to get cleaned up.
> -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
> think that the session is still active and associated with the ephemeral
> node in question.
> -It seems to stay in this state for some extended period of time (20+
> minutes). Interestingly, when I happened to fire up zkCli.sh I could see
> that the node was still there, but after I exited, the node seemed to
> disappear shortly afterwards. So, I wonder if the session established by
> zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral node?
>
> Has anyone experience this issue before? I understand that it's a bit of an
> edge case, but I'm running across it quite frequently when testing changing
> the size of ZK cluster.
>
> I've thought of a few work arounds for the issue, but I'd like to know if
> it's a known issue.
>
> Any help appreciated!
> cheers
>
>
>
> On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mc...@gmail.com>wrote:
>
>> Guys,
>> I've noticed a weird problem with ephemeral nodes not being cleaned up if
>> the session they are tied to times out while ZooKeeper does not have a
>> quorum. The situation is basically as follows:
>>
>> 3 node cluster
>> -Client connects to cluster and creates an ephemeral node
>> -Two nodes die, so quorum is lost
>> -Some time passes (longer than the session timeout negotiated for the
>> client that created the ephemeral node)
>> -One (or both) of the dead nodes come back and a quorum is reformed.
>> -The ephemeral node tied to the session which should have timed out still
>> exists
>>
>>

Re: Ephemeral node bound to a session that times out while ZK has no quorum

Posted by Cameron McKenzie <mc...@gmail.com>.
Sorry, bashed send prematurely!

Guys,
I've noticed a weird problem with ephemeral nodes not being cleaned up if
the session they are tied to times out while ZooKeeper does not have a
quorum. The situation is basically as follows:

3 node cluster
-Client connects to cluster and creates an ephemeral node
-Two nodes die, so quorum is lost
-Some time passes (longer than the session timeout negotiated for the
client that created the ephemeral node)
-One (or both) of the dead nodes come back and a quorum is reformed.
-The ephemeral node tied to the session which should have timed out still
exists and never seems to get cleaned up.
-If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
think that the session is still active and associated with the ephemeral
node in question.
-It seems to stay in this state for some extended period of time (20+
minutes). Interestingly, when I happened to fire up zkCli.sh I could see
that the node was still there, but after I exited, the node seemed to
disappear shortly afterwards. So, I wonder if the session established by
zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral node?

Has anyone experience this issue before? I understand that it's a bit of an
edge case, but I'm running across it quite frequently when testing changing
the size of ZK cluster.

I've thought of a few work arounds for the issue, but I'd like to know if
it's a known issue.

Any help appreciated!
cheers



On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mc...@gmail.com>wrote:

> Guys,
> I've noticed a weird problem with ephemeral nodes not being cleaned up if
> the session they are tied to times out while ZooKeeper does not have a
> quorum. The situation is basically as follows:
>
> 3 node cluster
> -Client connects to cluster and creates an ephemeral node
> -Two nodes die, so quorum is lost
> -Some time passes (longer than the session timeout negotiated for the
> client that created the ephemeral node)
> -One (or both) of the dead nodes come back and a quorum is reformed.
> -The ephemeral node tied to the session which should have timed out still
> exists
>
>