You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "박영근 (Alex)" <al...@nexr.com> on 2011/09/21 15:16:06 UTC

Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Hi, All

I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
unstable network condition.

While a client did not receive a message that a sequential node was
created,
the ensemble has the znode, which is checked at zookeeper dashboard(
https://github.com/phunt/zookeeper_dashboard).

If the client receives a DISCONNECTED event, it tries to reconnect.
Session timeout is 30 seconds.

Unstable network condition is made as the following:

The grinder agent sends a request of creating a znode of
CreateMode. SEQUENTIAL_EPHEMERAL.
ZK ensemble has three servers.
Each NIC of server is down and up repeatedly;
NIC of server1 become down every one minute and sleeps for 9 seconds, then
up
NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up
NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up

Is there any idea or related issue?

Thanks in advance.

Alex

Re: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by "박영근 (Alex)" <al...@nexr.com>.
Ted, Camille
Thanks for your reply.

The property that enables the creation of a znode with SEQUENTIAL_EPHEMERAL
mode is used
in ReadWriteLock  running on our analytics platform.

Hang has been caused by this problem so that we should search any other
solution.

Anyway, I will check out related issues.

Thanks,
Alex

2011/9/21 Ted Dunning <te...@gmail.com>

> If you cannot tolerate this sort of situation, then the only solution is
> typically to avoid sequential ephemerals.  The problem is that in the
> presence of a flaky network you cannot always tell if a failed create
> actually created the znode in question.  This is because the network may
> have failed after the create succeeded, but before you got the result.  In
> that case, since this is a sequential ephemeral, you can't know if your
> file
> got created because you don't even know the name.  Moreover, scanning
> doesn't help because if you could scan, you probably could have used a
> fixed
> unique name in the first place.
>
> There is a very long standing proposed (nearly complete) solution for this
> that requires some difficult coding.  See
> https://issues.apache.org/jira/browse/ZOOKEEPER-22
>
> 2011/9/21 Fournier, Camille F. <Ca...@gs.com>
>
> > This is expected. In cases where the network becomes unstable, it is the
> > responsibility of the client writer to handle disconnected events
> > appropriately and check to verify whether nodes they tried to write
> around
> > the time of these events did or did not succeed. It makes writing a
> > "Generic" client for ZK very difficult (search the mailing list for
> zkclient
> > and you'll read a bunch of convos around this topic). Fortunately, many
> > things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some
> duplication
> > of data, so often it's not a huge problem.
> >
> > C
> >
> > -----Original Message-----
> > From: 박영근(Alex) [mailto:alex.park@nexr.com]
> > Sent: Wednesday, September 21, 2011 9:16 AM
> > To: dev@zookeeper.apache.org
> > Cc: user@zookeeper.apache.org
> > Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt
> in
> > case of unstable network
> >
> > Hi, All
> >
> > I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
> > unstable network condition.
> >
> > While a client did not receive a message that a sequential node was
> > created,
> > the ensemble has the znode, which is checked at zookeeper dashboard(
> > https://github.com/phunt/zookeeper_dashboard).
> >
> > If the client receives a DISCONNECTED event, it tries to reconnect.
> > Session timeout is 30 seconds.
> >
> > Unstable network condition is made as the following:
> >
> > The grinder agent sends a request of creating a znode of
> > CreateMode. SEQUENTIAL_EPHEMERAL.
> > ZK ensemble has three servers.
> > Each NIC of server is down and up repeatedly;
> > NIC of server1 become down every one minute and sleeps for 9 seconds,
> then
> > up
> > NIC of server2 become down every 2 minute and sleeps for 9 seconds, then
> up
> > NIC of server3 become down every 3 minute and sleeps for 9 seconds, then
> up
> >
> > Is there any idea or related issue?
> >
> > Thanks in advance.
> >
> > Alex
> >
>

Re: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by "박영근 (Alex)" <al...@nexr.com>.
Ted, Camille
Thanks for your reply.

The property that enables the creation of a znode with SEQUENTIAL_EPHEMERAL
mode is used
in ReadWriteLock  running on our analytics platform.

Hang has been caused by this problem so that we should search any other
solution.

Anyway, I will check out related issues.

Thanks,
Alex

2011/9/21 Ted Dunning <te...@gmail.com>

> If you cannot tolerate this sort of situation, then the only solution is
> typically to avoid sequential ephemerals.  The problem is that in the
> presence of a flaky network you cannot always tell if a failed create
> actually created the znode in question.  This is because the network may
> have failed after the create succeeded, but before you got the result.  In
> that case, since this is a sequential ephemeral, you can't know if your
> file
> got created because you don't even know the name.  Moreover, scanning
> doesn't help because if you could scan, you probably could have used a
> fixed
> unique name in the first place.
>
> There is a very long standing proposed (nearly complete) solution for this
> that requires some difficult coding.  See
> https://issues.apache.org/jira/browse/ZOOKEEPER-22
>
> 2011/9/21 Fournier, Camille F. <Ca...@gs.com>
>
> > This is expected. In cases where the network becomes unstable, it is the
> > responsibility of the client writer to handle disconnected events
> > appropriately and check to verify whether nodes they tried to write
> around
> > the time of these events did or did not succeed. It makes writing a
> > "Generic" client for ZK very difficult (search the mailing list for
> zkclient
> > and you'll read a bunch of convos around this topic). Fortunately, many
> > things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some
> duplication
> > of data, so often it's not a huge problem.
> >
> > C
> >
> > -----Original Message-----
> > From: 박영근(Alex) [mailto:alex.park@nexr.com]
> > Sent: Wednesday, September 21, 2011 9:16 AM
> > To: dev@zookeeper.apache.org
> > Cc: user@zookeeper.apache.org
> > Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt
> in
> > case of unstable network
> >
> > Hi, All
> >
> > I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
> > unstable network condition.
> >
> > While a client did not receive a message that a sequential node was
> > created,
> > the ensemble has the znode, which is checked at zookeeper dashboard(
> > https://github.com/phunt/zookeeper_dashboard).
> >
> > If the client receives a DISCONNECTED event, it tries to reconnect.
> > Session timeout is 30 seconds.
> >
> > Unstable network condition is made as the following:
> >
> > The grinder agent sends a request of creating a znode of
> > CreateMode. SEQUENTIAL_EPHEMERAL.
> > ZK ensemble has three servers.
> > Each NIC of server is down and up repeatedly;
> > NIC of server1 become down every one minute and sleeps for 9 seconds,
> then
> > up
> > NIC of server2 become down every 2 minute and sleeps for 9 seconds, then
> up
> > NIC of server3 become down every 3 minute and sleeps for 9 seconds, then
> up
> >
> > Is there any idea or related issue?
> >
> > Thanks in advance.
> >
> > Alex
> >
>

Re: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by Ted Dunning <te...@gmail.com>.
If you cannot tolerate this sort of situation, then the only solution is
typically to avoid sequential ephemerals.  The problem is that in the
presence of a flaky network you cannot always tell if a failed create
actually created the znode in question.  This is because the network may
have failed after the create succeeded, but before you got the result.  In
that case, since this is a sequential ephemeral, you can't know if your file
got created because you don't even know the name.  Moreover, scanning
doesn't help because if you could scan, you probably could have used a fixed
unique name in the first place.

There is a very long standing proposed (nearly complete) solution for this
that requires some difficult coding.  See
https://issues.apache.org/jira/browse/ZOOKEEPER-22

2011/9/21 Fournier, Camille F. <Ca...@gs.com>

> This is expected. In cases where the network becomes unstable, it is the
> responsibility of the client writer to handle disconnected events
> appropriately and check to verify whether nodes they tried to write around
> the time of these events did or did not succeed. It makes writing a
> "Generic" client for ZK very difficult (search the mailing list for zkclient
> and you'll read a bunch of convos around this topic). Fortunately, many
> things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication
> of data, so often it's not a huge problem.
>
> C
>
> -----Original Message-----
> From: 박영근(Alex) [mailto:alex.park@nexr.com]
> Sent: Wednesday, September 21, 2011 9:16 AM
> To: dev@zookeeper.apache.org
> Cc: user@zookeeper.apache.org
> Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in
> case of unstable network
>
> Hi, All
>
> I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
> unstable network condition.
>
> While a client did not receive a message that a sequential node was
> created,
> the ensemble has the znode, which is checked at zookeeper dashboard(
> https://github.com/phunt/zookeeper_dashboard).
>
> If the client receives a DISCONNECTED event, it tries to reconnect.
> Session timeout is 30 seconds.
>
> Unstable network condition is made as the following:
>
> The grinder agent sends a request of creating a znode of
> CreateMode. SEQUENTIAL_EPHEMERAL.
> ZK ensemble has three servers.
> Each NIC of server is down and up repeatedly;
> NIC of server1 become down every one minute and sleeps for 9 seconds, then
> up
> NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up
> NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up
>
> Is there any idea or related issue?
>
> Thanks in advance.
>
> Alex
>

Re: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by Ted Dunning <te...@gmail.com>.
If you cannot tolerate this sort of situation, then the only solution is
typically to avoid sequential ephemerals.  The problem is that in the
presence of a flaky network you cannot always tell if a failed create
actually created the znode in question.  This is because the network may
have failed after the create succeeded, but before you got the result.  In
that case, since this is a sequential ephemeral, you can't know if your file
got created because you don't even know the name.  Moreover, scanning
doesn't help because if you could scan, you probably could have used a fixed
unique name in the first place.

There is a very long standing proposed (nearly complete) solution for this
that requires some difficult coding.  See
https://issues.apache.org/jira/browse/ZOOKEEPER-22

2011/9/21 Fournier, Camille F. <Ca...@gs.com>

> This is expected. In cases where the network becomes unstable, it is the
> responsibility of the client writer to handle disconnected events
> appropriately and check to verify whether nodes they tried to write around
> the time of these events did or did not succeed. It makes writing a
> "Generic" client for ZK very difficult (search the mailing list for zkclient
> and you'll read a bunch of convos around this topic). Fortunately, many
> things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication
> of data, so often it's not a huge problem.
>
> C
>
> -----Original Message-----
> From: 박영근(Alex) [mailto:alex.park@nexr.com]
> Sent: Wednesday, September 21, 2011 9:16 AM
> To: dev@zookeeper.apache.org
> Cc: user@zookeeper.apache.org
> Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in
> case of unstable network
>
> Hi, All
>
> I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
> unstable network condition.
>
> While a client did not receive a message that a sequential node was
> created,
> the ensemble has the znode, which is checked at zookeeper dashboard(
> https://github.com/phunt/zookeeper_dashboard).
>
> If the client receives a DISCONNECTED event, it tries to reconnect.
> Session timeout is 30 seconds.
>
> Unstable network condition is made as the following:
>
> The grinder agent sends a request of creating a znode of
> CreateMode. SEQUENTIAL_EPHEMERAL.
> ZK ensemble has three servers.
> Each NIC of server is down and up repeatedly;
> NIC of server1 become down every one minute and sleeps for 9 seconds, then
> up
> NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up
> NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up
>
> Is there any idea or related issue?
>
> Thanks in advance.
>
> Alex
>

RE: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by "Fournier, Camille F." <Ca...@gs.com>.
This is expected. In cases where the network becomes unstable, it is the responsibility of the client writer to handle disconnected events appropriately and check to verify whether nodes they tried to write around the time of these events did or did not succeed. It makes writing a "Generic" client for ZK very difficult (search the mailing list for zkclient and you'll read a bunch of convos around this topic). Fortunately, many things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication of data, so often it's not a huge problem.

C

-----Original Message-----
From: 박영근(Alex) [mailto:alex.park@nexr.com] 
Sent: Wednesday, September 21, 2011 9:16 AM
To: dev@zookeeper.apache.org
Cc: user@zookeeper.apache.org
Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Hi, All

I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
unstable network condition.

While a client did not receive a message that a sequential node was
created,
the ensemble has the znode, which is checked at zookeeper dashboard(
https://github.com/phunt/zookeeper_dashboard).

If the client receives a DISCONNECTED event, it tries to reconnect.
Session timeout is 30 seconds.

Unstable network condition is made as the following:

The grinder agent sends a request of creating a znode of
CreateMode. SEQUENTIAL_EPHEMERAL.
ZK ensemble has three servers.
Each NIC of server is down and up repeatedly;
NIC of server1 become down every one minute and sleeps for 9 seconds, then
up
NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up
NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up

Is there any idea or related issue?

Thanks in advance.

Alex

RE: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Posted by "Fournier, Camille F." <Ca...@gs.com>.
This is expected. In cases where the network becomes unstable, it is the responsibility of the client writer to handle disconnected events appropriately and check to verify whether nodes they tried to write around the time of these events did or did not succeed. It makes writing a "Generic" client for ZK very difficult (search the mailing list for zkclient and you'll read a bunch of convos around this topic). Fortunately, many things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication of data, so often it's not a huge problem.

C

-----Original Message-----
From: 박영근(Alex) [mailto:alex.park@nexr.com] 
Sent: Wednesday, September 21, 2011 9:16 AM
To: dev@zookeeper.apache.org
Cc: user@zookeeper.apache.org
Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

Hi, All

I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under
unstable network condition.

While a client did not receive a message that a sequential node was
created,
the ensemble has the znode, which is checked at zookeeper dashboard(
https://github.com/phunt/zookeeper_dashboard).

If the client receives a DISCONNECTED event, it tries to reconnect.
Session timeout is 30 seconds.

Unstable network condition is made as the following:

The grinder agent sends a request of creating a znode of
CreateMode. SEQUENTIAL_EPHEMERAL.
ZK ensemble has three servers.
Each NIC of server is down and up repeatedly;
NIC of server1 become down every one minute and sleeps for 9 seconds, then
up
NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up
NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up

Is there any idea or related issue?

Thanks in advance.

Alex