You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Jeff Widman <je...@jeffwidman.com> on 2018/01/09 20:38:31 UTC

Why are ephemeral nodes written to disk?

Ephemeral nodes only exist for the life of the client session.

As far as I understand, by definition, a client session ends when the
entire zookeeper ensemble goes down.

So I would expect that ephemeral nodes are only written to memory, not
disk. The ephemeral nodes would be sync'd across machines as a client
session can span multiple connections if a single zk server fails, but once
the ensemble is down there is no need to recover the ephemeral nodes from
disk.

However, when I looked at a zookeeper ensemble that is 99% ephemeral nodes,
I see a bunch of disk I/O from the zookeeper processes. So it appears that
ephemeral nodes are still written to disk...

Why is this?

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><

Re: Why are ephemeral nodes written to disk?

Posted by Patrick Hunt <ph...@apache.org>.
NP, sorry for the slow response but I've been out on vacation the past few
weeks. ;-)

Regards,

Patrick

On Wed, Jan 17, 2018 at 3:39 PM, Jeff Widman <je...@jeffwidman.com> wrote:

> Thank you. I did not realize sessions could continue even if the ensemble
> was shutdown.
>
> On Jan 17, 2018 3:28 PM, "Patrick Hunt" <ph...@apache.org> wrote:
>
> > On Tue, Jan 9, 2018 at 12:38 PM, Jeff Widman <je...@jeffwidman.com>
> wrote:
> >
> > > Ephemeral nodes only exist for the life of the client session.
> > >
> > > As far as I understand, by definition, a client session ends when the
> > > entire zookeeper ensemble goes down.
> > >
> > > So I would expect that ephemeral nodes are only written to memory, not
> > > disk. The ephemeral nodes would be sync'd across machines as a client
> > > session can span multiple connections if a single zk server fails, but
> > once
> > > the ensemble is down there is no need to recover the ephemeral nodes
> from
> > > disk.
> > >
> > > However, when I looked at a zookeeper ensemble that is 99% ephemeral
> > nodes,
> > > I see a bunch of disk I/O from the zookeeper processes. So it appears
> > that
> > > ephemeral nodes are still written to disk...
> > >
> > > Why is this?
> > >
> >
> > Ephemeral znodes are treated just like persistent znodes in the sense
> that
> > a quorum of nodes need to agree to any change. As such the znode is
> written
> > to the transaction log.
> >
> > "a client session ends when the entire zookeeper ensemble goes down"
> >
> > is not correct. A client session ends either when a client closes it's
> > session explicitly or the ZK quorum leader decides that the session has
> > expired (which is based on the negotiated session timeout). Only while a
> > leader is active can a session be expired (or closed for that matter).
> When
> > you shutdown an ensemble the sessions are maintained. If you were to, for
> > example, shut down an ensemble for an hour and then restart it the
> sessions
> > would still be active. The clock would "reset" when the new leader was
> > elected. If the client session is still active the session would
> continue,
> > any ephemeral znodes would still exist.
> >
> > Patrick
> >
> >
> > >
> > > --
> > >
> > > *Jeff Widman*
> > > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > > <><
> > >
> >
>

Re: Why are ephemeral nodes written to disk?

Posted by Jeff Widman <je...@jeffwidman.com>.
Thank you. I did not realize sessions could continue even if the ensemble
was shutdown.

On Jan 17, 2018 3:28 PM, "Patrick Hunt" <ph...@apache.org> wrote:

> On Tue, Jan 9, 2018 at 12:38 PM, Jeff Widman <je...@jeffwidman.com> wrote:
>
> > Ephemeral nodes only exist for the life of the client session.
> >
> > As far as I understand, by definition, a client session ends when the
> > entire zookeeper ensemble goes down.
> >
> > So I would expect that ephemeral nodes are only written to memory, not
> > disk. The ephemeral nodes would be sync'd across machines as a client
> > session can span multiple connections if a single zk server fails, but
> once
> > the ensemble is down there is no need to recover the ephemeral nodes from
> > disk.
> >
> > However, when I looked at a zookeeper ensemble that is 99% ephemeral
> nodes,
> > I see a bunch of disk I/O from the zookeeper processes. So it appears
> that
> > ephemeral nodes are still written to disk...
> >
> > Why is this?
> >
>
> Ephemeral znodes are treated just like persistent znodes in the sense that
> a quorum of nodes need to agree to any change. As such the znode is written
> to the transaction log.
>
> "a client session ends when the entire zookeeper ensemble goes down"
>
> is not correct. A client session ends either when a client closes it's
> session explicitly or the ZK quorum leader decides that the session has
> expired (which is based on the negotiated session timeout). Only while a
> leader is active can a session be expired (or closed for that matter). When
> you shutdown an ensemble the sessions are maintained. If you were to, for
> example, shut down an ensemble for an hour and then restart it the sessions
> would still be active. The clock would "reset" when the new leader was
> elected. If the client session is still active the session would continue,
> any ephemeral znodes would still exist.
>
> Patrick
>
>
> >
> > --
> >
> > *Jeff Widman*
> > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > <><
> >
>

Re: Why are ephemeral nodes written to disk?

Posted by Patrick Hunt <ph...@apache.org>.
On Tue, Jan 9, 2018 at 12:38 PM, Jeff Widman <je...@jeffwidman.com> wrote:

> Ephemeral nodes only exist for the life of the client session.
>
> As far as I understand, by definition, a client session ends when the
> entire zookeeper ensemble goes down.
>
> So I would expect that ephemeral nodes are only written to memory, not
> disk. The ephemeral nodes would be sync'd across machines as a client
> session can span multiple connections if a single zk server fails, but once
> the ensemble is down there is no need to recover the ephemeral nodes from
> disk.
>
> However, when I looked at a zookeeper ensemble that is 99% ephemeral nodes,
> I see a bunch of disk I/O from the zookeeper processes. So it appears that
> ephemeral nodes are still written to disk...
>
> Why is this?
>

Ephemeral znodes are treated just like persistent znodes in the sense that
a quorum of nodes need to agree to any change. As such the znode is written
to the transaction log.

"a client session ends when the entire zookeeper ensemble goes down"

is not correct. A client session ends either when a client closes it's
session explicitly or the ZK quorum leader decides that the session has
expired (which is based on the negotiated session timeout). Only while a
leader is active can a session be expired (or closed for that matter). When
you shutdown an ensemble the sessions are maintained. If you were to, for
example, shut down an ensemble for an hour and then restart it the sessions
would still be active. The clock would "reset" when the new leader was
elected. If the client session is still active the session would continue,
any ephemeral znodes would still exist.

Patrick


>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>