You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bookkeeper.apache.org by Sam Just <sj...@salesforce.com> on 2017/06/26 23:34:53 UTC

Write retry issues with ZooKeeperClient

BookKeeper has a wrapper class for the ZooKeeper client called
ZooKeeperClient.
Its purpose appears to be to transparently perform retries in the case that
ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event.

The trouble is that it's possible that a write which received a
ConnectionLoss
error as a return value actually succeeded.  Once ZooKeeperClient retries,
it'll
get back NodeExists and propagate that error to the caller, even though the
write succeeded and the node in fact did not exist.

It seems as though the same issue would hold for setData and delete calls
which
use the version argument -- you could get a spurious BadVersion.

I believe I've reproduced the variant with a spurious NodeExists.  It
manifested as a suprious BKLedgerExistException when running against a 3
instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
storage
to force Disconnect events by injecting write delays.  This seems to make
sense
as AbstractZkLedgerManager.createLedgerMetadata uses
ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
appears
to depend on the create atomicity to avoid two writers overwriting each
other's
nodes.

AbstractZkLedgerManager.writeLedger would seem to have the same problem with
its dependence on using setData with the version argument to avoid lost
updates.

Am I missing something in this analysis?  It seems to me that behavior could
be actually problematic during periods of spotty connectivity to the
ZooKeeper cluster.

Thanks!
-Sam

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

I agree. For create ledger, it is easy to handle - the zk ledger manager
can catch such exception and recreate a new one. For create ledger adv, the
ledger id is supplied by application, so the application is responsible for
handle this ledgerexists exception.

Thoughts?

On Jun 27, 2017 6:45 AM, "Venkateswara Rao Jujjuri" <ju...@gmail.com>
wrote:

This is nothing different in any network based system. Like nfs. So we need
to have some kind of logic on the client side to make reasonable
assumption. May not be perfect for negative testing.

JV

On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <gu...@gmail.com> wrote:

> Hi Sam,
>
> Let's assume there is no such retry logic. How do you expect to handle
this
> situation?
>
> Can your application try to create a new ledger or catch NodeExists
> exception?
>
> - Sijie
>
> On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > Hmm, curator seems to have essentially the same problem:
> > https://issues.apache.org/jira/browse/CURATOR-268
> > I'm not sure there's a good way to solve this transparently -- the right
> > answer is
> > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > interested callers, audit for metadata users where we depend on
> atomicity,
> > and update each one to handle it appropriately.
> > -Sam
> >
> > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > ZooKeeperClient.
> > > Its purpose appears to be to transparently perform retries in the case
> > that
> > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > event.
> > >
> > > The trouble is that it's possible that a write which received a
> > > ConnectionLoss
> > > error as a return value actually succeeded.  Once ZooKeeperClient
> > retries,
> > > it'll
> > > get back NodeExists and propagate that error to the caller, even
though
> > the
> > > write succeeded and the node in fact did not exist.
> > >
> > > It seems as though the same issue would hold for setData and delete
> calls
> > > which
> > > use the version argument -- you could get a spurious BadVersion.
> > >
> > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > manifested as a suprious BKLedgerExistException when running against a
> 3
> > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > storage
> > > to force Disconnect events by injecting write delays.  This seems to
> make
> > > sense
> > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > > appears
> > > to depend on the create atomicity to avoid two writers overwriting
each
> > > other's
> > > nodes.
> > >
> > > AbstractZkLedgerManager.writeLedger would seem to have the same
problem
> > > with
> > > its dependence on using setData with the version argument to avoid
lost
> > > updates.
> > >
> > > Am I missing something in this analysis?  It seems to me that behavior
> > > could
> > > be actually problematic during periods of spotty connectivity to the
> > > ZooKeeper cluster.
> > >
> > > Thanks!
> > > -Sam
> > >
> >
>
--
Sent from iPhone

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

On Tue, Jun 27, 2017 at 4:27 PM, Sam Just <sj...@salesforce.com> wrote:

> Should EXPIRED be considered a recoverable error for retry purposes?
> Retrying in that case would mean that operations which might have been
> submitted under the assumption that ephemeral nodes were still present
> would be retried after the ephemeral nodes disappeared.  Don't all users
> have special handling for EXPIRED anyway?
>

If a request is failed with expired, that means the request was never
attempted. so it is a recoverable error to retry.
When a request is failed with expired, that means the session that this
request used is also expired and all ephemeral nodes associated with this
session will also be deleted. so it is a safe situation for retries.


> -Sam
>
> On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com>
> > wrote:
> > > >
> > > > > JV: What do you mean by "May not be perfect for negative testing"?
> > > > >
> > > > > I don't think there's anything inevitable about this particular
> class
> > > of
> > > > > behavior.  ZK could have chosen to avoid this problem entirely by
> > doing
> > > > > duplicate op detection server-side with a per-session transaction
> > log.
> > > > >
> > > > > Since it doesn't, we'll need to solve it ourselves.
> > > > > ZkLedgerUnderreplicationManager relies on either getting success
> or
> > > > > NodeExists on the ephemeral lock node to determine whether a
> > particular
> > > > > ReplicationWorker is responsible for replicating a particular
> ledger.
> > > I
> > > > > haven't reproduced this one, but it seems to me that two workers
> > could
> > > > both
> > > > > try to replicate the same ledger and *both get NodeExists* on the
> > lock
> > > > > node.  This would leave the ledger locked for replication until
> > > whichever
> > > > > one actually wrote the node restarts.
> > > > >
> > > > > Users like the above are pretty easy to fix.  One option would be
> to
> > > > simply
> > > > > include with the write a payload including a nonce for the client.
> > > Upon
> > > > a
> > > > > ConnectionLoss event, we read the node and determine whether we
> > > "won".  I
> > > > > think ledger creation probably falls into the same category, the
> > > metadata
> > > > > could include an identifier for the creator.
> > > > >
> > > >
> > > > for ephemeral znode, you don't have to add extra payload.
> > > >
> > > > in the retry logic,
> > > >
> > > > - catch NodeExists exception
> > > > - call exists to check the znode. you can get the Stat (
> > > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > > > zookeeper/data/Stat.html#getEphemeralOwner()
> > > > ) of this znode.
> > > > - you can compare the Stat#getEphmeralOwner with the client's current
> > > > session id. if they match, the node is created by this session,
> > otherwise
> > > > this node is created by other session.
> > > >
> > > >
> > > Ah, I'll do that then, much easier.
> > >
> > >
> > > >
> > > > >
> > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.
> It
> > > > > relies
> > > > > on setData success/BadVersion atomicity to ensure consistent rmw.
> > > >
> > > > It's
> > > > > certainly solvable by including a version vector, but I think
> there's
> > > > > probably a simpler way to do it.  I'm not sure under what
> > circumstances
> > > > > there even *can* be racing writes to the metadata -- is the only
> case
> > > > that
> > > > > matters concurrent updates between the writer and a single
> > replication
> > > > > worker?
> > > > >
> > > >
> > > > I don't think it can be addressed by a version vector, there is no
> > public
> > > > *history* about who changed what version.
> > > > But the question is why do you need to address this
> > #writeLedgerMetadata
> > > > inside the ZooKeeperClient retry loop. I think there is already a
> logic
> > > on
> > > > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > > > The LedgerHandle will re-read the ledger metadata when encountering
> > > version
> > > > conflicts.
> > > >
> > > >
> > > That's the part I'm unsure of.  I know it has handling for BadVersion,
> I
> > > don't know
> > > whether it's still correct if you get a BadVersion even though your
> write
> > > actually
> > > succeeded, I need to look into that more.  Also, I'm not at all
> > suggesting
> > > we handle
> > > this in the retry loop, I'm suggesting that for non-idempotent writes,
> we
> > > should not
> > > retry in ZooKeeperClient and instead propagate the ConnectionLoss error
> > and
> > > let the caller deal with it.
> > >
> >
> > +1 I agree with no retrying non-idempotent writes.
> >
> >
> > > -Sam
> > >
> > >
> > > >
> > > > >
> > > > > The outline is probably:
> > > > > 1) Add a way to do retries automatically on *unconditional* creates
> > and
> > > > > deletes (there are some users that don't care).
> > > > > 2) Audit and modify existing users to either use the methods
> > introduced
> > > > in
> > > > > 1 or handle the ConnectionLoss events explicitly.
> > > > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > > > propagating the ConnectionLoss event, that should make its
> semantics
> > > > match
> > > > > the vanilla ZK client.
> > > > > -Sam
> > > > >
> > > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > >
> > > > > > I agree in current callback model, we only propagate error code
> to
> > > the
> > > > > > outer client, so we lose the information about the detail cause.
> > > > > >
> > > > > > But I think we also tried to map some zk error codes to bk error
> > > codes.
> > > > > >
> > > > > > NoNode -> NoLedger
> > > > > > NodeExists -> LedgerExists
> > > > > > ··· (other codes)
> > > > > >
> > > > > > Also we tried to hide zk metadata related implementation behind
> > > > > interfaces,
> > > > > > so some of the errors should be handled by the zk ledger manager
> > > > > > implementation before propagating to the client.
> > > > > >
> > > > > > Sijie
> > > > > >
> > > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > > > enrico.olivelli@diennea.com> wrote:
> > > > > >
> > > > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> > > Jujjuri
> > > > ha
> > > > > > > scritto:
> > > > > > >
> > > > > > > This is nothing different in any network based system. Like
> nfs.
> > So
> > > > we
> > > > > > need
> > > > > > > to have some kind of logic on the client side to make
> reasonable
> > > > > > > assumption. May not be perfect for negative testing.
> > > > > > >
> > > > > > >
> > > > > > > Many times I wanted to have some "exception cause" on
> > BKException,
> > > > > > > expecially for ZK issues.
> > > > > > > The way we use only int error codes hides the root causes of
> the
> > > > error.
> > > > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > > > reported
> > > > > > to
> > > > > > > higher level logs and sometime this is annoying.
> > > > > > > In the future I would like to add more details on errors
> > > > > > >
> > > > > > > -- Enrico
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > JV
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com
> > > > <mailto:
> > > > > > guo
> > > > > > > sijie@gmail.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Sam,
> > > > > > >
> > > > > > > Let's assume there is no such retry logic. How do you expect to
> > > > handle
> > > > > > this
> > > > > > > situation?
> > > > > > >
> > > > > > > Can your application try to create a new ledger or catch
> > NodeExists
> > > > > > > exception?
> > > > > > >
> > > > > > > - Sijie
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <
> sjust@salesforce.com
> > > > > <mailto:s
> > > > > > > just@salesforce.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hmm, curator seems to have essentially the same problem:
> > > > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > > > I'm not sure there's a good way to solve this transparently --
> > the
> > > > > right
> > > > > > > answer is
> > > > > > > probably to plumb the ConnectionLoss event through
> > ZooKeeperClient
> > > > for
> > > > > > > interested callers, audit for metadata users where we depend on
> > > > > > >
> > > > > > >
> > > > > > > atomicity,
> > > > > > >
> > > > > > >
> > > > > > > and update each one to handle it appropriately.
> > > > > > > -Sam
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <
> sjust@salesforce.com
> > > > > <mailto:s
> > > > > > > just@salesforce.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > > > ZooKeeperClient.
> > > > > > > Its purpose appears to be to transparently perform retries in
> the
> > > > case
> > > > > > >
> > > > > > >
> > > > > > > that
> > > > > > >
> > > > > > >
> > > > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> > > Disconnect
> > > > > > >
> > > > > > >
> > > > > > > event.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The trouble is that it's possible that a write which received a
> > > > > > > ConnectionLoss
> > > > > > > error as a return value actually succeeded.  Once
> ZooKeeperClient
> > > > > > >
> > > > > > >
> > > > > > > retries,
> > > > > > >
> > > > > > >
> > > > > > > it'll
> > > > > > > get back NodeExists and propagate that error to the caller,
> even
> > > > though
> > > > > > >
> > > > > > >
> > > > > > > the
> > > > > > >
> > > > > > >
> > > > > > > write succeeded and the node in fact did not exist.
> > > > > > >
> > > > > > > It seems as though the same issue would hold for setData and
> > delete
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > calls
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > which
> > > > > > > use the version argument -- you could get a spurious
> BadVersion.
> > > > > > >
> > > > > > > I believe I've reproduced the variant with a spurious
> NodeExists.
> > > It
> > > > > > > manifested as a suprious BKLedgerExistException when running
> > > against
> > > > a
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> > > instance
> > > > > > > storage
> > > > > > > to force Disconnect events by injecting write delays.  This
> seems
> > > to
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > make
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > sense
> > > > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata
> > node
> > > > and
> > > > > > > appears
> > > > > > > to depend on the create atomicity to avoid two writers
> > overwriting
> > > > each
> > > > > > > other's
> > > > > > > nodes.
> > > > > > >
> > > > > > > AbstractZkLedgerManager.writeLedger would seem to have the
> same
> > > > > problem
> > > > > > > with
> > > > > > > its dependence on using setData with the version argument to
> > avoid
> > > > lost
> > > > > > > updates.
> > > > > > >
> > > > > > > Am I missing something in this analysis?  It seems to me that
> > > > behavior
> > > > > > > could
> > > > > > > be actually problematic during periods of spotty connectivity
> to
> > > the
> > > > > > > ZooKeeper cluster.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > -Sam
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Enrico Olivelli Software Development Manager @Diennea Tel.:
> (+39)
> > > > 0546
> > > > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> > > MagNews -
> > > > > > > E-mail Marketing Solutions http://www.magnews.it Diennea -
> > Digital
> > > > > > > Marketing Solutions http://www.diennea.com
> > > > > > >
> > > > > > > ________________________________
> > > > > > >
> > > > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su
> > digital
> > > > ed
> > > > > > > email marketing! http://www.magnews.it/newsletter/
> > > > > > >
> > > > > > > The information in this email is confidential and may be
> legally
> > > > > > > privileged. If you are not the intended recipient please notify
> > the
> > > > > > sender
> > > > > > > immediately and destroy this email. Any unauthorized, direct or
> > > > > indirect,
> > > > > > > disclosure, copying, storage, distribution or other use is
> > strictly
> > > > > > > forbidden.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sam Just <sj...@salesforce.com>.

Should EXPIRED be considered a recoverable error for retry purposes?
Retrying in that case would mean that operations which might have been
submitted under the assumption that ephemeral nodes were still present
would be retried after the ephemeral nodes disappeared.  Don't all users
have special handling for EXPIRED anyway?
-Sam

On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <gu...@gmail.com> wrote:
> >
> > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com>
> wrote:
> > >
> > > > JV: What do you mean by "May not be perfect for negative testing"?
> > > >
> > > > I don't think there's anything inevitable about this particular class
> > of
> > > > behavior.  ZK could have chosen to avoid this problem entirely by
> doing
> > > > duplicate op detection server-side with a per-session transaction
> log.
> > > >
> > > > Since it doesn't, we'll need to solve it ourselves.
> > > > ZkLedgerUnderreplicationManager relies on either getting success or
> > > > NodeExists on the ephemeral lock node to determine whether a
> particular
> > > > ReplicationWorker is responsible for replicating a particular ledger.
> > I
> > > > haven't reproduced this one, but it seems to me that two workers
> could
> > > both
> > > > try to replicate the same ledger and *both get NodeExists* on the
> lock
> > > > node.  This would leave the ledger locked for replication until
> > whichever
> > > > one actually wrote the node restarts.
> > > >
> > > > Users like the above are pretty easy to fix.  One option would be to
> > > simply
> > > > include with the write a payload including a nonce for the client.
> > Upon
> > > a
> > > > ConnectionLoss event, we read the node and determine whether we
> > "won".  I
> > > > think ledger creation probably falls into the same category, the
> > metadata
> > > > could include an identifier for the creator.
> > > >
> > >
> > > for ephemeral znode, you don't have to add extra payload.
> > >
> > > in the retry logic,
> > >
> > > - catch NodeExists exception
> > > - call exists to check the znode. you can get the Stat (
> > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > > zookeeper/data/Stat.html#getEphemeralOwner()
> > > ) of this znode.
> > > - you can compare the Stat#getEphmeralOwner with the client's current
> > > session id. if they match, the node is created by this session,
> otherwise
> > > this node is created by other session.
> > >
> > >
> > Ah, I'll do that then, much easier.
> >
> >
> > >
> > > >
> > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > > > relies
> > > > on setData success/BadVersion atomicity to ensure consistent rmw.
> > >
> > > It's
> > > > certainly solvable by including a version vector, but I think there's
> > > > probably a simpler way to do it.  I'm not sure under what
> circumstances
> > > > there even *can* be racing writes to the metadata -- is the only case
> > > that
> > > > matters concurrent updates between the writer and a single
> replication
> > > > worker?
> > > >
> > >
> > > I don't think it can be addressed by a version vector, there is no
> public
> > > *history* about who changed what version.
> > > But the question is why do you need to address this
> #writeLedgerMetadata
> > > inside the ZooKeeperClient retry loop. I think there is already a logic
> > on
> > > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > > The LedgerHandle will re-read the ledger metadata when encountering
> > version
> > > conflicts.
> > >
> > >
> > That's the part I'm unsure of.  I know it has handling for BadVersion, I
> > don't know
> > whether it's still correct if you get a BadVersion even though your write
> > actually
> > succeeded, I need to look into that more.  Also, I'm not at all
> suggesting
> > we handle
> > this in the retry loop, I'm suggesting that for non-idempotent writes, we
> > should not
> > retry in ZooKeeperClient and instead propagate the ConnectionLoss error
> and
> > let the caller deal with it.
> >
>
> +1 I agree with no retrying non-idempotent writes.
>
>
> > -Sam
> >
> >
> > >
> > > >
> > > > The outline is probably:
> > > > 1) Add a way to do retries automatically on *unconditional* creates
> and
> > > > deletes (there are some users that don't care).
> > > > 2) Audit and modify existing users to either use the methods
> introduced
> > > in
> > > > 1 or handle the ConnectionLoss events explicitly.
> > > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > > propagating the ConnectionLoss event, that should make its semantics
> > > match
> > > > the vanilla ZK client.
> > > > -Sam
> > > >
> > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > I agree in current callback model, we only propagate error code to
> > the
> > > > > outer client, so we lose the information about the detail cause.
> > > > >
> > > > > But I think we also tried to map some zk error codes to bk error
> > codes.
> > > > >
> > > > > NoNode -> NoLedger
> > > > > NodeExists -> LedgerExists
> > > > > ··· (other codes)
> > > > >
> > > > > Also we tried to hide zk metadata related implementation behind
> > > > interfaces,
> > > > > so some of the errors should be handled by the zk ledger manager
> > > > > implementation before propagating to the client.
> > > > >
> > > > > Sijie
> > > > >
> > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > > enrico.olivelli@diennea.com> wrote:
> > > > >
> > > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> > Jujjuri
> > > ha
> > > > > > scritto:
> > > > > >
> > > > > > This is nothing different in any network based system. Like nfs.
> So
> > > we
> > > > > need
> > > > > > to have some kind of logic on the client side to make reasonable
> > > > > > assumption. May not be perfect for negative testing.
> > > > > >
> > > > > >
> > > > > > Many times I wanted to have some "exception cause" on
> BKException,
> > > > > > expecially for ZK issues.
> > > > > > The way we use only int error codes hides the root causes of the
> > > error.
> > > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > > reported
> > > > > to
> > > > > > higher level logs and sometime this is annoying.
> > > > > > In the future I would like to add more details on errors
> > > > > >
> > > > > > -- Enrico
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > JV
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com
> > > <mailto:
> > > > > guo
> > > > > > sijie@gmail.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Sam,
> > > > > >
> > > > > > Let's assume there is no such retry logic. How do you expect to
> > > handle
> > > > > this
> > > > > > situation?
> > > > > >
> > > > > > Can your application try to create a new ledger or catch
> NodeExists
> > > > > > exception?
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com
> > > > <mailto:s
> > > > > > just@salesforce.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hmm, curator seems to have essentially the same problem:
> > > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > > I'm not sure there's a good way to solve this transparently --
> the
> > > > right
> > > > > > answer is
> > > > > > probably to plumb the ConnectionLoss event through
> ZooKeeperClient
> > > for
> > > > > > interested callers, audit for metadata users where we depend on
> > > > > >
> > > > > >
> > > > > > atomicity,
> > > > > >
> > > > > >
> > > > > > and update each one to handle it appropriately.
> > > > > > -Sam
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com
> > > > <mailto:s
> > > > > > just@salesforce.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > > ZooKeeperClient.
> > > > > > Its purpose appears to be to transparently perform retries in the
> > > case
> > > > > >
> > > > > >
> > > > > > that
> > > > > >
> > > > > >
> > > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> > Disconnect
> > > > > >
> > > > > >
> > > > > > event.
> > > > > >
> > > > > >
> > > > > >
> > > > > > The trouble is that it's possible that a write which received a
> > > > > > ConnectionLoss
> > > > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > > > >
> > > > > >
> > > > > > retries,
> > > > > >
> > > > > >
> > > > > > it'll
> > > > > > get back NodeExists and propagate that error to the caller, even
> > > though
> > > > > >
> > > > > >
> > > > > > the
> > > > > >
> > > > > >
> > > > > > write succeeded and the node in fact did not exist.
> > > > > >
> > > > > > It seems as though the same issue would hold for setData and
> delete
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > calls
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > which
> > > > > > use the version argument -- you could get a spurious BadVersion.
> > > > > >
> > > > > > I believe I've reproduced the variant with a spurious NodeExists.
> > It
> > > > > > manifested as a suprious BKLedgerExistException when running
> > against
> > > a
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 3
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> > instance
> > > > > > storage
> > > > > > to force Disconnect events by injecting write delays.  This seems
> > to
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > make
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > sense
> > > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata
> node
> > > and
> > > > > > appears
> > > > > > to depend on the create atomicity to avoid two writers
> overwriting
> > > each
> > > > > > other's
> > > > > > nodes.
> > > > > >
> > > > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > > > problem
> > > > > > with
> > > > > > its dependence on using setData with the version argument to
> avoid
> > > lost
> > > > > > updates.
> > > > > >
> > > > > > Am I missing something in this analysis?  It seems to me that
> > > behavior
> > > > > > could
> > > > > > be actually problematic during periods of spotty connectivity to
> > the
> > > > > > ZooKeeper cluster.
> > > > > >
> > > > > > Thanks!
> > > > > > -Sam
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> > > 0546
> > > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> > MagNews -
> > > > > > E-mail Marketing Solutions http://www.magnews.it Diennea -
> Digital
> > > > > > Marketing Solutions http://www.diennea.com
> > > > > >
> > > > > > ________________________________
> > > > > >
> > > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su
> digital
> > > ed
> > > > > > email marketing! http://www.magnews.it/newsletter/
> > > > > >
> > > > > > The information in this email is confidential and may be legally
> > > > > > privileged. If you are not the intended recipient please notify
> the
> > > > > sender
> > > > > > immediately and destroy this email. Any unauthorized, direct or
> > > > indirect,
> > > > > > disclosure, copying, storage, distribution or other use is
> strictly
> > > > > > forbidden.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.

Current retry logic is transparent to the type of the operation and is down
the stack which has no understanding of the operation type. If we want to
differentiate between idempotent and non-idempotent, we may need to insert
that logic into the retry logic. Other option is to handle retry at higher
level were we have operational context. Both seems little messy. Looks like
we can take care of ephemeral node issue as explained above. For ledger
nodes and metadata updates we need to come up with possible failures and
ramifications of getting node exists then we can figure out how to handle
that.

On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <gu...@gmail.com> wrote:
> >
> > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com>
> wrote:
> > >
> > > > JV: What do you mean by "May not be perfect for negative testing"?
> > > >
> > > > I don't think there's anything inevitable about this particular class
> > of
> > > > behavior.  ZK could have chosen to avoid this problem entirely by
> doing
> > > > duplicate op detection server-side with a per-session transaction
> log.
> > > >
> > > > Since it doesn't, we'll need to solve it ourselves.
> > > > ZkLedgerUnderreplicationManager relies on either getting success or
> > > > NodeExists on the ephemeral lock node to determine whether a
> particular
> > > > ReplicationWorker is responsible for replicating a particular ledger.
> > I
> > > > haven't reproduced this one, but it seems to me that two workers
> could
> > > both
> > > > try to replicate the same ledger and *both get NodeExists* on the
> lock
> > > > node.  This would leave the ledger locked for replication until
> > whichever
> > > > one actually wrote the node restarts.
> > > >
> > > > Users like the above are pretty easy to fix.  One option would be to
> > > simply
> > > > include with the write a payload including a nonce for the client.
> > Upon
> > > a
> > > > ConnectionLoss event, we read the node and determine whether we
> > "won".  I
> > > > think ledger creation probably falls into the same category, the
> > metadata
> > > > could include an identifier for the creator.
> > > >
> > >
> > > for ephemeral znode, you don't have to add extra payload.
> > >
> > > in the retry logic,
> > >
> > > - catch NodeExists exception
> > > - call exists to check the znode. you can get the Stat (
> > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > > zookeeper/data/Stat.html#getEphemeralOwner()
> > > ) of this znode.
> > > - you can compare the Stat#getEphmeralOwner with the client's current
> > > session id. if they match, the node is created by this session,
> otherwise
> > > this node is created by other session.
> > >
> > >
> > Ah, I'll do that then, much easier.
> >
> >
> > >
> > > >
> > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > > > relies
> > > > on setData success/BadVersion atomicity to ensure consistent rmw.
> > >
> > > It's
> > > > certainly solvable by including a version vector, but I think there's
> > > > probably a simpler way to do it.  I'm not sure under what
> circumstances
> > > > there even *can* be racing writes to the metadata -- is the only case
> > > that
> > > > matters concurrent updates between the writer and a single
> replication
> > > > worker?
> > > >
> > >
> > > I don't think it can be addressed by a version vector, there is no
> public
> > > *history* about who changed what version.
> > > But the question is why do you need to address this
> #writeLedgerMetadata
> > > inside the ZooKeeperClient retry loop. I think there is already a logic
> > on
> > > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > > The LedgerHandle will re-read the ledger metadata when encountering
> > version
> > > conflicts.
> > >
> > >
> > That's the part I'm unsure of.  I know it has handling for BadVersion, I
> > don't know
> > whether it's still correct if you get a BadVersion even though your write
> > actually
> > succeeded, I need to look into that more.  Also, I'm not at all
> suggesting
> > we handle
> > this in the retry loop, I'm suggesting that for non-idempotent writes, we
> > should not
> > retry in ZooKeeperClient and instead propagate the ConnectionLoss error
> and
> > let the caller deal with it.
> >
>
> +1 I agree with no retrying non-idempotent writes.
>
>
> > -Sam
> >
> >
> > >
> > > >
> > > > The outline is probably:
> > > > 1) Add a way to do retries automatically on *unconditional* creates
> and
> > > > deletes (there are some users that don't care).
> > > > 2) Audit and modify existing users to either use the methods
> introduced
> > > in
> > > > 1 or handle the ConnectionLoss events explicitly.
> > > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > > propagating the ConnectionLoss event, that should make its semantics
> > > match
> > > > the vanilla ZK client.
> > > > -Sam
> > > >
> > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > I agree in current callback model, we only propagate error code to
> > the
> > > > > outer client, so we lose the information about the detail cause.
> > > > >
> > > > > But I think we also tried to map some zk error codes to bk error
> > codes.
> > > > >
> > > > > NoNode -> NoLedger
> > > > > NodeExists -> LedgerExists
> > > > > ··· (other codes)
> > > > >
> > > > > Also we tried to hide zk metadata related implementation behind
> > > > interfaces,
> > > > > so some of the errors should be handled by the zk ledger manager
> > > > > implementation before propagating to the client.
> > > > >
> > > > > Sijie
> > > > >
> > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > > enrico.olivelli@diennea.com> wrote:
> > > > >
> > > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> > Jujjuri
> > > ha
> > > > > > scritto:
> > > > > >
> > > > > > This is nothing different in any network based system. Like nfs.
> So
> > > we
> > > > > need
> > > > > > to have some kind of logic on the client side to make reasonable
> > > > > > assumption. May not be perfect for negative testing.
> > > > > >
> > > > > >
> > > > > > Many times I wanted to have some "exception cause" on
> BKException,
> > > > > > expecially for ZK issues.
> > > > > > The way we use only int error codes hides the root causes of the
> > > error.
> > > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > > reported
> > > > > to
> > > > > > higher level logs and sometime this is annoying.
> > > > > > In the future I would like to add more details on errors
> > > > > >
> > > > > > -- Enrico
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > JV
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com
> > > <mailto:
> > > > > guo
> > > > > > sijie@gmail.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Sam,
> > > > > >
> > > > > > Let's assume there is no such retry logic. How do you expect to
> > > handle
> > > > > this
> > > > > > situation?
> > > > > >
> > > > > > Can your application try to create a new ledger or catch
> NodeExists
> > > > > > exception?
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com
> > > > <mailto:s
> > > > > > just@salesforce.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hmm, curator seems to have essentially the same problem:
> > > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > > I'm not sure there's a good way to solve this transparently --
> the
> > > > right
> > > > > > answer is
> > > > > > probably to plumb the ConnectionLoss event through
> ZooKeeperClient
> > > for
> > > > > > interested callers, audit for metadata users where we depend on
> > > > > >
> > > > > >
> > > > > > atomicity,
> > > > > >
> > > > > >
> > > > > > and update each one to handle it appropriately.
> > > > > > -Sam
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com
> > > > <mailto:s
> > > > > > just@salesforce.com>> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > > ZooKeeperClient.
> > > > > > Its purpose appears to be to transparently perform retries in the
> > > case
> > > > > >
> > > > > >
> > > > > > that
> > > > > >
> > > > > >
> > > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> > Disconnect
> > > > > >
> > > > > >
> > > > > > event.
> > > > > >
> > > > > >
> > > > > >
> > > > > > The trouble is that it's possible that a write which received a
> > > > > > ConnectionLoss
> > > > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > > > >
> > > > > >
> > > > > > retries,
> > > > > >
> > > > > >
> > > > > > it'll
> > > > > > get back NodeExists and propagate that error to the caller, even
> > > though
> > > > > >
> > > > > >
> > > > > > the
> > > > > >
> > > > > >
> > > > > > write succeeded and the node in fact did not exist.
> > > > > >
> > > > > > It seems as though the same issue would hold for setData and
> delete
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > calls
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > which
> > > > > > use the version argument -- you could get a spurious BadVersion.
> > > > > >
> > > > > > I believe I've reproduced the variant with a spurious NodeExists.
> > It
> > > > > > manifested as a suprious BKLedgerExistException when running
> > against
> > > a
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 3
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> > instance
> > > > > > storage
> > > > > > to force Disconnect events by injecting write delays.  This seems
> > to
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > make
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > sense
> > > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata
> node
> > > and
> > > > > > appears
> > > > > > to depend on the create atomicity to avoid two writers
> overwriting
> > > each
> > > > > > other's
> > > > > > nodes.
> > > > > >
> > > > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > > > problem
> > > > > > with
> > > > > > its dependence on using setData with the version argument to
> avoid
> > > lost
> > > > > > updates.
> > > > > >
> > > > > > Am I missing something in this analysis?  It seems to me that
> > > behavior
> > > > > > could
> > > > > > be actually problematic during periods of spotty connectivity to
> > the
> > > > > > ZooKeeper cluster.
> > > > > >
> > > > > > Thanks!
> > > > > > -Sam
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> > > 0546
> > > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> > MagNews -
> > > > > > E-mail Marketing Solutions http://www.magnews.it Diennea -
> Digital
> > > > > > Marketing Solutions http://www.diennea.com
> > > > > >
> > > > > > ________________________________
> > > > > >
> > > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su
> digital
> > > ed
> > > > > > email marketing! http://www.magnews.it/newsletter/
> > > > > >
> > > > > > The information in this email is confidential and may be legally
> > > > > > privileged. If you are not the intended recipient please notify
> the
> > > > > sender
> > > > > > immediately and destroy this email. Any unauthorized, direct or
> > > > indirect,
> > > > > > disclosure, copying, storage, distribution or other use is
> strictly
> > > > > > forbidden.
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:

> On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > JV: What do you mean by "May not be perfect for negative testing"?
> > >
> > > I don't think there's anything inevitable about this particular class
> of
> > > behavior.  ZK could have chosen to avoid this problem entirely by doing
> > > duplicate op detection server-side with a per-session transaction log.
> > >
> > > Since it doesn't, we'll need to solve it ourselves.
> > > ZkLedgerUnderreplicationManager relies on either getting success or
> > > NodeExists on the ephemeral lock node to determine whether a particular
> > > ReplicationWorker is responsible for replicating a particular ledger.
> I
> > > haven't reproduced this one, but it seems to me that two workers could
> > both
> > > try to replicate the same ledger and *both get NodeExists* on the lock
> > > node.  This would leave the ledger locked for replication until
> whichever
> > > one actually wrote the node restarts.
> > >
> > > Users like the above are pretty easy to fix.  One option would be to
> > simply
> > > include with the write a payload including a nonce for the client.
> Upon
> > a
> > > ConnectionLoss event, we read the node and determine whether we
> "won".  I
> > > think ledger creation probably falls into the same category, the
> metadata
> > > could include an identifier for the creator.
> > >
> >
> > for ephemeral znode, you don't have to add extra payload.
> >
> > in the retry logic,
> >
> > - catch NodeExists exception
> > - call exists to check the znode. you can get the Stat (
> > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > zookeeper/data/Stat.html#getEphemeralOwner()
> > ) of this znode.
> > - you can compare the Stat#getEphmeralOwner with the client's current
> > session id. if they match, the node is created by this session, otherwise
> > this node is created by other session.
> >
> >
> Ah, I'll do that then, much easier.
>
>
> >
> > >
> > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > > relies
> > > on setData success/BadVersion atomicity to ensure consistent rmw.
> >
> > It's
> > > certainly solvable by including a version vector, but I think there's
> > > probably a simpler way to do it.  I'm not sure under what circumstances
> > > there even *can* be racing writes to the metadata -- is the only case
> > that
> > > matters concurrent updates between the writer and a single replication
> > > worker?
> > >
> >
> > I don't think it can be addressed by a version vector, there is no public
> > *history* about who changed what version.
> > But the question is why do you need to address this #writeLedgerMetadata
> > inside the ZooKeeperClient retry loop. I think there is already a logic
> on
> > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > The LedgerHandle will re-read the ledger metadata when encountering
> version
> > conflicts.
> >
> >
> That's the part I'm unsure of.  I know it has handling for BadVersion, I
> don't know
> whether it's still correct if you get a BadVersion even though your write
> actually
> succeeded, I need to look into that more.  Also, I'm not at all suggesting
> we handle
> this in the retry loop, I'm suggesting that for non-idempotent writes, we
> should not
> retry in ZooKeeperClient and instead propagate the ConnectionLoss error and
> let the caller deal with it.
>

+1 I agree with no retrying non-idempotent writes.


> -Sam
>
>
> >
> > >
> > > The outline is probably:
> > > 1) Add a way to do retries automatically on *unconditional* creates and
> > > deletes (there are some users that don't care).
> > > 2) Audit and modify existing users to either use the methods introduced
> > in
> > > 1 or handle the ConnectionLoss events explicitly.
> > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > propagating the ConnectionLoss event, that should make its semantics
> > match
> > > the vanilla ZK client.
> > > -Sam
> > >
> > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > I agree in current callback model, we only propagate error code to
> the
> > > > outer client, so we lose the information about the detail cause.
> > > >
> > > > But I think we also tried to map some zk error codes to bk error
> codes.
> > > >
> > > > NoNode -> NoLedger
> > > > NodeExists -> LedgerExists
> > > > ··· (other codes)
> > > >
> > > > Also we tried to hide zk metadata related implementation behind
> > > interfaces,
> > > > so some of the errors should be handled by the zk ledger manager
> > > > implementation before propagating to the client.
> > > >
> > > > Sijie
> > > >
> > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > enrico.olivelli@diennea.com> wrote:
> > > >
> > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> Jujjuri
> > ha
> > > > > scritto:
> > > > >
> > > > > This is nothing different in any network based system. Like nfs. So
> > we
> > > > need
> > > > > to have some kind of logic on the client side to make reasonable
> > > > > assumption. May not be perfect for negative testing.
> > > > >
> > > > >
> > > > > Many times I wanted to have some "exception cause" on BKException,
> > > > > expecially for ZK issues.
> > > > > The way we use only int error codes hides the root causes of the
> > error.
> > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > reported
> > > > to
> > > > > higher level logs and sometime this is annoying.
> > > > > In the future I would like to add more details on errors
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > JV
> > > > >
> > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com
> > <mailto:
> > > > guo
> > > > > sijie@gmail.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi Sam,
> > > > >
> > > > > Let's assume there is no such retry logic. How do you expect to
> > handle
> > > > this
> > > > > situation?
> > > > >
> > > > > Can your application try to create a new ledger or catch NodeExists
> > > > > exception?
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com
> > > <mailto:s
> > > > > just@salesforce.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hmm, curator seems to have essentially the same problem:
> > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > I'm not sure there's a good way to solve this transparently -- the
> > > right
> > > > > answer is
> > > > > probably to plumb the ConnectionLoss event through ZooKeeperClient
> > for
> > > > > interested callers, audit for metadata users where we depend on
> > > > >
> > > > >
> > > > > atomicity,
> > > > >
> > > > >
> > > > > and update each one to handle it appropriately.
> > > > > -Sam
> > > > >
> > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com
> > > <mailto:s
> > > > > just@salesforce.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > ZooKeeperClient.
> > > > > Its purpose appears to be to transparently perform retries in the
> > case
> > > > >
> > > > >
> > > > > that
> > > > >
> > > > >
> > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> Disconnect
> > > > >
> > > > >
> > > > > event.
> > > > >
> > > > >
> > > > >
> > > > > The trouble is that it's possible that a write which received a
> > > > > ConnectionLoss
> > > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > > >
> > > > >
> > > > > retries,
> > > > >
> > > > >
> > > > > it'll
> > > > > get back NodeExists and propagate that error to the caller, even
> > though
> > > > >
> > > > >
> > > > > the
> > > > >
> > > > >
> > > > > write succeeded and the node in fact did not exist.
> > > > >
> > > > > It seems as though the same issue would hold for setData and delete
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > calls
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > which
> > > > > use the version argument -- you could get a spurious BadVersion.
> > > > >
> > > > > I believe I've reproduced the variant with a spurious NodeExists.
> It
> > > > > manifested as a suprious BKLedgerExistException when running
> against
> > a
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 3
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> instance
> > > > > storage
> > > > > to force Disconnect events by injecting write delays.  This seems
> to
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > make
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > sense
> > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node
> > and
> > > > > appears
> > > > > to depend on the create atomicity to avoid two writers overwriting
> > each
> > > > > other's
> > > > > nodes.
> > > > >
> > > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > > problem
> > > > > with
> > > > > its dependence on using setData with the version argument to avoid
> > lost
> > > > > updates.
> > > > >
> > > > > Am I missing something in this analysis?  It seems to me that
> > behavior
> > > > > could
> > > > > be actually problematic during periods of spotty connectivity to
> the
> > > > > ZooKeeper cluster.
> > > > >
> > > > > Thanks!
> > > > > -Sam
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> > 0546
> > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> MagNews -
> > > > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > > > Marketing Solutions http://www.diennea.com
> > > > >
> > > > > ________________________________
> > > > >
> > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital
> > ed
> > > > > email marketing! http://www.magnews.it/newsletter/
> > > > >
> > > > > The information in this email is confidential and may be legally
> > > > > privileged. If you are not the intended recipient please notify the
> > > > sender
> > > > > immediately and destroy this email. Any unauthorized, direct or
> > > indirect,
> > > > > disclosure, copying, storage, distribution or other use is strictly
> > > > > forbidden.
> > > > >
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sam Just <sj...@salesforce.com>.

On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:
>
> > JV: What do you mean by "May not be perfect for negative testing"?
> >
> > I don't think there's anything inevitable about this particular class of
> > behavior.  ZK could have chosen to avoid this problem entirely by doing
> > duplicate op detection server-side with a per-session transaction log.
> >
> > Since it doesn't, we'll need to solve it ourselves.
> > ZkLedgerUnderreplicationManager relies on either getting success or
> > NodeExists on the ephemeral lock node to determine whether a particular
> > ReplicationWorker is responsible for replicating a particular ledger.  I
> > haven't reproduced this one, but it seems to me that two workers could
> both
> > try to replicate the same ledger and *both get NodeExists* on the lock
> > node.  This would leave the ledger locked for replication until whichever
> > one actually wrote the node restarts.
> >
> > Users like the above are pretty easy to fix.  One option would be to
> simply
> > include with the write a payload including a nonce for the client.  Upon
> a
> > ConnectionLoss event, we read the node and determine whether we "won".  I
> > think ledger creation probably falls into the same category, the metadata
> > could include an identifier for the creator.
> >
>
> for ephemeral znode, you don't have to add extra payload.
>
> in the retry logic,
>
> - catch NodeExists exception
> - call exists to check the znode. you can get the Stat (
> https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> zookeeper/data/Stat.html#getEphemeralOwner()
> ) of this znode.
> - you can compare the Stat#getEphmeralOwner with the client's current
> session id. if they match, the node is created by this session, otherwise
> this node is created by other session.
>
>
Ah, I'll do that then, much easier.


>
> >
> > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > relies
> > on setData success/BadVersion atomicity to ensure consistent rmw.
>
> It's
> > certainly solvable by including a version vector, but I think there's
> > probably a simpler way to do it.  I'm not sure under what circumstances
> > there even *can* be racing writes to the metadata -- is the only case
> that
> > matters concurrent updates between the writer and a single replication
> > worker?
> >
>
> I don't think it can be addressed by a version vector, there is no public
> *history* about who changed what version.
> But the question is why do you need to address this #writeLedgerMetadata
> inside the ZooKeeperClient retry loop. I think there is already a logic on
> #writeLedgerMetadata to resolve conflicts due to BadVersion.
> The LedgerHandle will re-read the ledger metadata when encountering version
> conflicts.
>
>
That's the part I'm unsure of.  I know it has handling for BadVersion, I
don't know
whether it's still correct if you get a BadVersion even though your write
actually
succeeded, I need to look into that more.  Also, I'm not at all suggesting
we handle
this in the retry loop, I'm suggesting that for non-idempotent writes, we
should not
retry in ZooKeeperClient and instead propagate the ConnectionLoss error and
let the caller deal with it.
-Sam


>
> >
> > The outline is probably:
> > 1) Add a way to do retries automatically on *unconditional* creates and
> > deletes (there are some users that don't care).
> > 2) Audit and modify existing users to either use the methods introduced
> in
> > 1 or handle the ConnectionLoss events explicitly.
> > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > propagating the ConnectionLoss event, that should make its semantics
> match
> > the vanilla ZK client.
> > -Sam
> >
> > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com> wrote:
> >
> > > I agree in current callback model, we only propagate error code to the
> > > outer client, so we lose the information about the detail cause.
> > >
> > > But I think we also tried to map some zk error codes to bk error codes.
> > >
> > > NoNode -> NoLedger
> > > NodeExists -> LedgerExists
> > > ··· (other codes)
> > >
> > > Also we tried to hide zk metadata related implementation behind
> > interfaces,
> > > so some of the errors should be handled by the zk ledger manager
> > > implementation before propagating to the client.
> > >
> > > Sijie
> > >
> > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > enrico.olivelli@diennea.com> wrote:
> > >
> > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri
> ha
> > > > scritto:
> > > >
> > > > This is nothing different in any network based system. Like nfs. So
> we
> > > need
> > > > to have some kind of logic on the client side to make reasonable
> > > > assumption. May not be perfect for negative testing.
> > > >
> > > >
> > > > Many times I wanted to have some "exception cause" on BKException,
> > > > expecially for ZK issues.
> > > > The way we use only int error codes hides the root causes of the
> error.
> > > > BookKeeper client writes to the log, but the "cause" cannot be
> reported
> > > to
> > > > higher level logs and sometime this is annoying.
> > > > In the future I would like to add more details on errors
> > > >
> > > > -- Enrico
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > JV
> > > >
> > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com
> <mailto:
> > > guo
> > > > sijie@gmail.com>> wrote:
> > > >
> > > >
> > > >
> > > > Hi Sam,
> > > >
> > > > Let's assume there is no such retry logic. How do you expect to
> handle
> > > this
> > > > situation?
> > > >
> > > > Can your application try to create a new ledger or catch NodeExists
> > > > exception?
> > > >
> > > > - Sijie
> > > >
> > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com
> > <mailto:s
> > > > just@salesforce.com>> wrote:
> > > >
> > > >
> > > >
> > > > Hmm, curator seems to have essentially the same problem:
> > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > I'm not sure there's a good way to solve this transparently -- the
> > right
> > > > answer is
> > > > probably to plumb the ConnectionLoss event through ZooKeeperClient
> for
> > > > interested callers, audit for metadata users where we depend on
> > > >
> > > >
> > > > atomicity,
> > > >
> > > >
> > > > and update each one to handle it appropriately.
> > > > -Sam
> > > >
> > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com
> > <mailto:s
> > > > just@salesforce.com>> wrote:
> > > >
> > > >
> > > >
> > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > ZooKeeperClient.
> > > > Its purpose appears to be to transparently perform retries in the
> case
> > > >
> > > >
> > > > that
> > > >
> > > >
> > > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > > >
> > > >
> > > > event.
> > > >
> > > >
> > > >
> > > > The trouble is that it's possible that a write which received a
> > > > ConnectionLoss
> > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > >
> > > >
> > > > retries,
> > > >
> > > >
> > > > it'll
> > > > get back NodeExists and propagate that error to the caller, even
> though
> > > >
> > > >
> > > > the
> > > >
> > > >
> > > > write succeeded and the node in fact did not exist.
> > > >
> > > > It seems as though the same issue would hold for setData and delete
> > > >
> > > >
> > > >
> > > >
> > > > calls
> > > >
> > > >
> > > >
> > > >
> > > > which
> > > > use the version argument -- you could get a spurious BadVersion.
> > > >
> > > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > > manifested as a suprious BKLedgerExistException when running against
> a
> > > >
> > > >
> > > >
> > > >
> > > > 3
> > > >
> > > >
> > > >
> > > >
> > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > > storage
> > > > to force Disconnect events by injecting write delays.  This seems to
> > > >
> > > >
> > > >
> > > >
> > > > make
> > > >
> > > >
> > > >
> > > >
> > > > sense
> > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node
> and
> > > > appears
> > > > to depend on the create atomicity to avoid two writers overwriting
> each
> > > > other's
> > > > nodes.
> > > >
> > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > problem
> > > > with
> > > > its dependence on using setData with the version argument to avoid
> lost
> > > > updates.
> > > >
> > > > Am I missing something in this analysis?  It seems to me that
> behavior
> > > > could
> > > > be actually problematic during periods of spotty connectivity to the
> > > > ZooKeeper cluster.
> > > >
> > > > Thanks!
> > > > -Sam
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> 0546
> > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > > Marketing Solutions http://www.diennea.com
> > > >
> > > > ________________________________
> > > >
> > > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital
> ed
> > > > email marketing! http://www.magnews.it/newsletter/
> > > >
> > > > The information in this email is confidential and may be legally
> > > > privileged. If you are not the intended recipient please notify the
> > > sender
> > > > immediately and destroy this email. Any unauthorized, direct or
> > indirect,
> > > > disclosure, copying, storage, distribution or other use is strictly
> > > > forbidden.
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:

> JV: What do you mean by "May not be perfect for negative testing"?
>
> I don't think there's anything inevitable about this particular class of
> behavior.  ZK could have chosen to avoid this problem entirely by doing
> duplicate op detection server-side with a per-session transaction log.
>
> Since it doesn't, we'll need to solve it ourselves.
> ZkLedgerUnderreplicationManager relies on either getting success or
> NodeExists on the ephemeral lock node to determine whether a particular
> ReplicationWorker is responsible for replicating a particular ledger.  I
> haven't reproduced this one, but it seems to me that two workers could both
> try to replicate the same ledger and *both get NodeExists* on the lock
> node.  This would leave the ledger locked for replication until whichever
> one actually wrote the node restarts.
>
> Users like the above are pretty easy to fix.  One option would be to simply
> include with the write a payload including a nonce for the client.  Upon a
> ConnectionLoss event, we read the node and determine whether we "won".  I
> think ledger creation probably falls into the same category, the metadata
> could include an identifier for the creator.
>

for ephemeral znode, you don't have to add extra payload.

in the retry logic,

- catch NodeExists exception
- call exists to check the znode. you can get the Stat (
https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/data/Stat.html#getEphemeralOwner()
) of this znode.
- you can compare the Stat#getEphmeralOwner with the client's current
session id. if they match, the node is created by this session, otherwise
this node is created by other session.


>
> I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> relies
> on setData success/BadVersion atomicity to ensure consistent rmw.

It's
> certainly solvable by including a version vector, but I think there's
> probably a simpler way to do it.  I'm not sure under what circumstances
> there even *can* be racing writes to the metadata -- is the only case that
> matters concurrent updates between the writer and a single replication
> worker?
>

I don't think it can be addressed by a version vector, there is no public
*history* about who changed what version.
But the question is why do you need to address this #writeLedgerMetadata
inside the ZooKeeperClient retry loop. I think there is already a logic on
#writeLedgerMetadata to resolve conflicts due to BadVersion.
The LedgerHandle will re-read the ledger metadata when encountering version
conflicts.


>
> The outline is probably:
> 1) Add a way to do retries automatically on *unconditional* creates and
> deletes (there are some users that don't care).
> 2) Audit and modify existing users to either use the methods introduced in
> 1 or handle the ConnectionLoss events explicitly.
> 3) Switch ZooKeeperClient to not retry conditional writes instead
> propagating the ConnectionLoss event, that should make its semantics match
> the vanilla ZK client.
> -Sam
>
> On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com> wrote:
>
> > I agree in current callback model, we only propagate error code to the
> > outer client, so we lose the information about the detail cause.
> >
> > But I think we also tried to map some zk error codes to bk error codes.
> >
> > NoNode -> NoLedger
> > NodeExists -> LedgerExists
> > ··· (other codes)
> >
> > Also we tried to hide zk metadata related implementation behind
> interfaces,
> > so some of the errors should be handled by the zk ledger manager
> > implementation before propagating to the client.
> >
> > Sijie
> >
> > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > enrico.olivelli@diennea.com> wrote:
> >
> > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha
> > > scritto:
> > >
> > > This is nothing different in any network based system. Like nfs. So we
> > need
> > > to have some kind of logic on the client side to make reasonable
> > > assumption. May not be perfect for negative testing.
> > >
> > >
> > > Many times I wanted to have some "exception cause" on BKException,
> > > expecially for ZK issues.
> > > The way we use only int error codes hides the root causes of the error.
> > > BookKeeper client writes to the log, but the "cause" cannot be reported
> > to
> > > higher level logs and sometime this is annoying.
> > > In the future I would like to add more details on errors
> > >
> > > -- Enrico
> > >
> > >
> > >
> > >
> > >
> > > JV
> > >
> > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com<mailto:
> > guo
> > > sijie@gmail.com>> wrote:
> > >
> > >
> > >
> > > Hi Sam,
> > >
> > > Let's assume there is no such retry logic. How do you expect to handle
> > this
> > > situation?
> > >
> > > Can your application try to create a new ledger or catch NodeExists
> > > exception?
> > >
> > > - Sijie
> > >
> > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com
> <mailto:s
> > > just@salesforce.com>> wrote:
> > >
> > >
> > >
> > > Hmm, curator seems to have essentially the same problem:
> > > https://issues.apache.org/jira/browse/CURATOR-268
> > > I'm not sure there's a good way to solve this transparently -- the
> right
> > > answer is
> > > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > > interested callers, audit for metadata users where we depend on
> > >
> > >
> > > atomicity,
> > >
> > >
> > > and update each one to handle it appropriately.
> > > -Sam
> > >
> > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com
> <mailto:s
> > > just@salesforce.com>> wrote:
> > >
> > >
> > >
> > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > ZooKeeperClient.
> > > Its purpose appears to be to transparently perform retries in the case
> > >
> > >
> > > that
> > >
> > >
> > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > >
> > >
> > > event.
> > >
> > >
> > >
> > > The trouble is that it's possible that a write which received a
> > > ConnectionLoss
> > > error as a return value actually succeeded.  Once ZooKeeperClient
> > >
> > >
> > > retries,
> > >
> > >
> > > it'll
> > > get back NodeExists and propagate that error to the caller, even though
> > >
> > >
> > > the
> > >
> > >
> > > write succeeded and the node in fact did not exist.
> > >
> > > It seems as though the same issue would hold for setData and delete
> > >
> > >
> > >
> > >
> > > calls
> > >
> > >
> > >
> > >
> > > which
> > > use the version argument -- you could get a spurious BadVersion.
> > >
> > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > manifested as a suprious BKLedgerExistException when running against a
> > >
> > >
> > >
> > >
> > > 3
> > >
> > >
> > >
> > >
> > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > storage
> > > to force Disconnect events by injecting write delays.  This seems to
> > >
> > >
> > >
> > >
> > > make
> > >
> > >
> > >
> > >
> > > sense
> > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > > appears
> > > to depend on the create atomicity to avoid two writers overwriting each
> > > other's
> > > nodes.
> > >
> > > AbstractZkLedgerManager.writeLedger would seem to have the same
> problem
> > > with
> > > its dependence on using setData with the version argument to avoid lost
> > > updates.
> > >
> > > Am I missing something in this analysis?  It seems to me that behavior
> > > could
> > > be actually problematic during periods of spotty connectivity to the
> > > ZooKeeper cluster.
> > >
> > > Thanks!
> > > -Sam
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > Marketing Solutions http://www.diennea.com
> > >
> > > ________________________________
> > >
> > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> > > email marketing! http://www.magnews.it/newsletter/
> > >
> > > The information in this email is confidential and may be legally
> > > privileged. If you are not the intended recipient please notify the
> > sender
> > > immediately and destroy this email. Any unauthorized, direct or
> indirect,
> > > disclosure, copying, storage, distribution or other use is strictly
> > > forbidden.
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sam Just <sj...@salesforce.com>.

JV: What do you mean by "May not be perfect for negative testing"?

I don't think there's anything inevitable about this particular class of
behavior.  ZK could have chosen to avoid this problem entirely by doing
duplicate op detection server-side with a per-session transaction log.

Since it doesn't, we'll need to solve it ourselves.
ZkLedgerUnderreplicationManager relies on either getting success or
NodeExists on the ephemeral lock node to determine whether a particular
ReplicationWorker is responsible for replicating a particular ledger.  I
haven't reproduced this one, but it seems to me that two workers could both
try to replicate the same ledger and *both get NodeExists* on the lock
node.  This would leave the ledger locked for replication until whichever
one actually wrote the node restarts.

Users like the above are pretty easy to fix.  One option would be to simply
include with the write a payload including a nonce for the client.  Upon a
ConnectionLoss event, we read the node and determine whether we "won".  I
think ledger creation probably falls into the same category, the metadata
could include an identifier for the creator.

I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It relies
on setData success/BadVersion atomicity to ensure consistent rmw.  It's
certainly solvable by including a version vector, but I think there's
probably a simpler way to do it.  I'm not sure under what circumstances
there even *can* be racing writes to the metadata -- is the only case that
matters concurrent updates between the writer and a single replication
worker?

The outline is probably:
1) Add a way to do retries automatically on *unconditional* creates and
deletes (there are some users that don't care).
2) Audit and modify existing users to either use the methods introduced in
1 or handle the ConnectionLoss events explicitly.
3) Switch ZooKeeperClient to not retry conditional writes instead
propagating the ConnectionLoss event, that should make its semantics match
the vanilla ZK client.
-Sam

On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <gu...@gmail.com> wrote:

> I agree in current callback model, we only propagate error code to the
> outer client, so we lose the information about the detail cause.
>
> But I think we also tried to map some zk error codes to bk error codes.
>
> NoNode -> NoLedger
> NodeExists -> LedgerExists
> ··· (other codes)
>
> Also we tried to hide zk metadata related implementation behind interfaces,
> so some of the errors should be handled by the zk ledger manager
> implementation before propagating to the client.
>
> Sijie
>
> On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> enrico.olivelli@diennea.com> wrote:
>
> > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha
> > scritto:
> >
> > This is nothing different in any network based system. Like nfs. So we
> need
> > to have some kind of logic on the client side to make reasonable
> > assumption. May not be perfect for negative testing.
> >
> >
> > Many times I wanted to have some "exception cause" on BKException,
> > expecially for ZK issues.
> > The way we use only int error codes hides the root causes of the error.
> > BookKeeper client writes to the log, but the "cause" cannot be reported
> to
> > higher level logs and sometime this is annoying.
> > In the future I would like to add more details on errors
> >
> > -- Enrico
> >
> >
> >
> >
> >
> > JV
> >
> > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com<mailto:
> guo
> > sijie@gmail.com>> wrote:
> >
> >
> >
> > Hi Sam,
> >
> > Let's assume there is no such retry logic. How do you expect to handle
> this
> > situation?
> >
> > Can your application try to create a new ledger or catch NodeExists
> > exception?
> >
> > - Sijie
> >
> > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com<mailto:s
> > just@salesforce.com>> wrote:
> >
> >
> >
> > Hmm, curator seems to have essentially the same problem:
> > https://issues.apache.org/jira/browse/CURATOR-268
> > I'm not sure there's a good way to solve this transparently -- the right
> > answer is
> > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > interested callers, audit for metadata users where we depend on
> >
> >
> > atomicity,
> >
> >
> > and update each one to handle it appropriately.
> > -Sam
> >
> > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com<mailto:s
> > just@salesforce.com>> wrote:
> >
> >
> >
> > BookKeeper has a wrapper class for the ZooKeeper client called
> > ZooKeeperClient.
> > Its purpose appears to be to transparently perform retries in the case
> >
> >
> > that
> >
> >
> > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> >
> >
> > event.
> >
> >
> >
> > The trouble is that it's possible that a write which received a
> > ConnectionLoss
> > error as a return value actually succeeded.  Once ZooKeeperClient
> >
> >
> > retries,
> >
> >
> > it'll
> > get back NodeExists and propagate that error to the caller, even though
> >
> >
> > the
> >
> >
> > write succeeded and the node in fact did not exist.
> >
> > It seems as though the same issue would hold for setData and delete
> >
> >
> >
> >
> > calls
> >
> >
> >
> >
> > which
> > use the version argument -- you could get a spurious BadVersion.
> >
> > I believe I've reproduced the variant with a spurious NodeExists.  It
> > manifested as a suprious BKLedgerExistException when running against a
> >
> >
> >
> >
> > 3
> >
> >
> >
> >
> > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > storage
> > to force Disconnect events by injecting write delays.  This seems to
> >
> >
> >
> >
> > make
> >
> >
> >
> >
> > sense
> > as AbstractZkLedgerManager.createLedgerMetadata uses
> > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > appears
> > to depend on the create atomicity to avoid two writers overwriting each
> > other's
> > nodes.
> >
> > AbstractZkLedgerManager.writeLedger would seem to have the same problem
> > with
> > its dependence on using setData with the version argument to avoid lost
> > updates.
> >
> > Am I missing something in this analysis?  It seems to me that behavior
> > could
> > be actually problematic during periods of spotty connectivity to the
> > ZooKeeper cluster.
> >
> > Thanks!
> > -Sam
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > Marketing Solutions http://www.diennea.com
> >
> > ________________________________
> >
> > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> > email marketing! http://www.magnews.it/newsletter/
> >
> > The information in this email is confidential and may be legally
> > privileged. If you are not the intended recipient please notify the
> sender
> > immediately and destroy this email. Any unauthorized, direct or indirect,
> > disclosure, copying, storage, distribution or other use is strictly
> > forbidden.
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

I agree in current callback model, we only propagate error code to the
outer client, so we lose the information about the detail cause.

But I think we also tried to map some zk error codes to bk error codes.

NoNode -> NoLedger
NodeExists -> LedgerExists
··· (other codes)

Also we tried to hide zk metadata related implementation behind interfaces,
so some of the errors should be handled by the zk ledger manager
implementation before propagating to the client.

Sijie

On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
enrico.olivelli@diennea.com> wrote:

> Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha
> scritto:
>
> This is nothing different in any network based system. Like nfs. So we need
> to have some kind of logic on the client side to make reasonable
> assumption. May not be perfect for negative testing.
>
>
> Many times I wanted to have some "exception cause" on BKException,
> expecially for ZK issues.
> The way we use only int error codes hides the root causes of the error.
> BookKeeper client writes to the log, but the "cause" cannot be reported to
> higher level logs and sometime this is annoying.
> In the future I would like to add more details on errors
>
> -- Enrico
>
>
>
>
>
> JV
>
> On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosijie@gmail.com<mailto:guo
> sijie@gmail.com>> wrote:
>
>
>
> Hi Sam,
>
> Let's assume there is no such retry logic. How do you expect to handle this
> situation?
>
> Can your application try to create a new ledger or catch NodeExists
> exception?
>
> - Sijie
>
> On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sjust@salesforce.com<mailto:s
> just@salesforce.com>> wrote:
>
>
>
> Hmm, curator seems to have essentially the same problem:
> https://issues.apache.org/jira/browse/CURATOR-268
> I'm not sure there's a good way to solve this transparently -- the right
> answer is
> probably to plumb the ConnectionLoss event through ZooKeeperClient for
> interested callers, audit for metadata users where we depend on
>
>
> atomicity,
>
>
> and update each one to handle it appropriately.
> -Sam
>
> On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sjust@salesforce.com<mailto:s
> just@salesforce.com>> wrote:
>
>
>
> BookKeeper has a wrapper class for the ZooKeeper client called
> ZooKeeperClient.
> Its purpose appears to be to transparently perform retries in the case
>
>
> that
>
>
> ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
>
>
> event.
>
>
>
> The trouble is that it's possible that a write which received a
> ConnectionLoss
> error as a return value actually succeeded.  Once ZooKeeperClient
>
>
> retries,
>
>
> it'll
> get back NodeExists and propagate that error to the caller, even though
>
>
> the
>
>
> write succeeded and the node in fact did not exist.
>
> It seems as though the same issue would hold for setData and delete
>
>
>
>
> calls
>
>
>
>
> which
> use the version argument -- you could get a spurious BadVersion.
>
> I believe I've reproduced the variant with a spurious NodeExists.  It
> manifested as a suprious BKLedgerExistException when running against a
>
>
>
>
> 3
>
>
>
>
> instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> storage
> to force Disconnect events by injecting write delays.  This seems to
>
>
>
>
> make
>
>
>
>
> sense
> as AbstractZkLedgerManager.createLedgerMetadata uses
> ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> appears
> to depend on the create atomicity to avoid two writers overwriting each
> other's
> nodes.
>
> AbstractZkLedgerManager.writeLedger would seem to have the same problem
> with
> its dependence on using setData with the version argument to avoid lost
> updates.
>
> Am I missing something in this analysis?  It seems to me that behavior
> could
> be actually problematic during periods of spotty connectivity to the
> ZooKeeper cluster.
>
> Thanks!
> -Sam
>
>
>
>
>
>
>
>
>
> --
>
> Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> Marketing Solutions http://www.diennea.com
>
> ________________________________
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>

Re: Write retry issues with ZooKeeperClient

Posted by Enrico Olivelli - Diennea <en...@diennea.com>.

Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha scritto:

This is nothing different in any network based system. Like nfs. So we need
to have some kind of logic on the client side to make reasonable
assumption. May not be perfect for negative testing.

Many times I wanted to have some "exception cause" on BKException, expecially for ZK issues.
The way we use only int error codes hides the root causes of the error.
BookKeeper client writes to the log, but the "cause" cannot be reported to higher level logs and sometime this is annoying.
In the future I would like to add more details on errors

-- Enrico

On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <gu...@gmail.com>> wrote:

Hi Sam,

Let's assume there is no such retry logic. How do you expect to handle this
situation?

Can your application try to create a new ledger or catch NodeExists
exception?

- Sijie

On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com>> wrote:

Hmm, curator seems to have essentially the same problem:
https://issues.apache.org/jira/browse/CURATOR-268
I'm not sure there's a good way to solve this transparently -- the right
answer is
probably to plumb the ConnectionLoss event through ZooKeeperClient for
interested callers, audit for metadata users where we depend on

atomicity,

and update each one to handle it appropriately.
-Sam

On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com>> wrote:

BookKeeper has a wrapper class for the ZooKeeper client called
ZooKeeperClient.
Its purpose appears to be to transparently perform retries in the case

that

ZooKeeper returns ConnectionLoss on an operation due to a Disconnect

event.

The trouble is that it's possible that a write which received a
ConnectionLoss
error as a return value actually succeeded. Once ZooKeeperClient

retries,

it'll
get back NodeExists and propagate that error to the caller, even though

the

write succeeded and the node in fact did not exist.

It seems as though the same issue would hold for setData and delete

calls

which
use the version argument -- you could get a spurious BadVersion.

I believe I've reproduced the variant with a spurious NodeExists. It
manifested as a suprious BKLedgerExistException when running against a

instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
storage
to force Disconnect events by injecting write delays. This seems to

make

sense
as AbstractZkLedgerManager.createLedgerMetadata uses
ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
appears
to depend on the create atomicity to avoid two writers overwriting each
other's
nodes.

AbstractZkLedgerManager.writeLedger would seem to have the same problem
with
its dependence on using setData with the version argument to avoid lost
updates.

Am I missing something in this analysis? It seems to me that behavior
could
be actually problematic during periods of spotty connectivity to the
ZooKeeper cluster.

Thanks!
-Sam

Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews - E-mail Marketing Solutions http://www.magnews.it Diennea - Digital Marketing Solutions http://www.diennea.com

________________________________

Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally privileged. If you are not the intended recipient please notify the sender immediately and destroy this email. Any unauthorized, direct or indirect, disclosure, copying, storage, distribution or other use is strictly forbidden.

Re: Write retry issues with ZooKeeperClient

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.

This is nothing different in any network based system. Like nfs. So we need
to have some kind of logic on the client side to make reasonable
assumption. May not be perfect for negative testing.

JV

On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <gu...@gmail.com> wrote:

> Hi Sam,
>
> Let's assume there is no such retry logic. How do you expect to handle this
> situation?
>
> Can your application try to create a new ledger or catch NodeExists
> exception?
>
> - Sijie
>
> On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > Hmm, curator seems to have essentially the same problem:
> > https://issues.apache.org/jira/browse/CURATOR-268
> > I'm not sure there's a good way to solve this transparently -- the right
> > answer is
> > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > interested callers, audit for metadata users where we depend on
> atomicity,
> > and update each one to handle it appropriately.
> > -Sam
> >
> > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > ZooKeeperClient.
> > > Its purpose appears to be to transparently perform retries in the case
> > that
> > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > event.
> > >
> > > The trouble is that it's possible that a write which received a
> > > ConnectionLoss
> > > error as a return value actually succeeded.  Once ZooKeeperClient
> > retries,
> > > it'll
> > > get back NodeExists and propagate that error to the caller, even though
> > the
> > > write succeeded and the node in fact did not exist.
> > >
> > > It seems as though the same issue would hold for setData and delete
> calls
> > > which
> > > use the version argument -- you could get a spurious BadVersion.
> > >
> > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > manifested as a suprious BKLedgerExistException when running against a
> 3
> > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > storage
> > > to force Disconnect events by injecting write delays.  This seems to
> make
> > > sense
> > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > > appears
> > > to depend on the create atomicity to avoid two writers overwriting each
> > > other's
> > > nodes.
> > >
> > > AbstractZkLedgerManager.writeLedger would seem to have the same problem
> > > with
> > > its dependence on using setData with the version argument to avoid lost
> > > updates.
> > >
> > > Am I missing something in this analysis?  It seems to me that behavior
> > > could
> > > be actually problematic during periods of spotty connectivity to the
> > > ZooKeeper cluster.
> > >
> > > Thanks!
> > > -Sam
> > >
> >
>
-- 
Sent from iPhone

Re: Write retry issues with ZooKeeperClient

Posted by Sijie Guo <gu...@gmail.com>.

Hi Sam,

Let's assume there is no such retry logic. How do you expect to handle this
situation?

Can your application try to create a new ledger or catch NodeExists
exception?

- Sijie

On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com> wrote:

> Hmm, curator seems to have essentially the same problem:
> https://issues.apache.org/jira/browse/CURATOR-268
> I'm not sure there's a good way to solve this transparently -- the right
> answer is
> probably to plumb the ConnectionLoss event through ZooKeeperClient for
> interested callers, audit for metadata users where we depend on atomicity,
> and update each one to handle it appropriately.
> -Sam
>
> On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > BookKeeper has a wrapper class for the ZooKeeper client called
> > ZooKeeperClient.
> > Its purpose appears to be to transparently perform retries in the case
> that
> > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> event.
> >
> > The trouble is that it's possible that a write which received a
> > ConnectionLoss
> > error as a return value actually succeeded.  Once ZooKeeperClient
> retries,
> > it'll
> > get back NodeExists and propagate that error to the caller, even though
> the
> > write succeeded and the node in fact did not exist.
> >
> > It seems as though the same issue would hold for setData and delete calls
> > which
> > use the version argument -- you could get a spurious BadVersion.
> >
> > I believe I've reproduced the variant with a spurious NodeExists.  It
> > manifested as a suprious BKLedgerExistException when running against a 3
> > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > storage
> > to force Disconnect events by injecting write delays.  This seems to make
> > sense
> > as AbstractZkLedgerManager.createLedgerMetadata uses
> > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > appears
> > to depend on the create atomicity to avoid two writers overwriting each
> > other's
> > nodes.
> >
> > AbstractZkLedgerManager.writeLedger would seem to have the same problem
> > with
> > its dependence on using setData with the version argument to avoid lost
> > updates.
> >
> > Am I missing something in this analysis?  It seems to me that behavior
> > could
> > be actually problematic during periods of spotty connectivity to the
> > ZooKeeper cluster.
> >
> > Thanks!
> > -Sam
> >
>

Re: Write retry issues with ZooKeeperClient

Posted by Sam Just <sj...@salesforce.com>.

Hmm, curator seems to have essentially the same problem:
https://issues.apache.org/jira/browse/CURATOR-268
I'm not sure there's a good way to solve this transparently -- the right
answer is
probably to plumb the ConnectionLoss event through ZooKeeperClient for
interested callers, audit for metadata users where we depend on atomicity,
and update each one to handle it appropriately.
-Sam

On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:

> BookKeeper has a wrapper class for the ZooKeeper client called
> ZooKeeperClient.
> Its purpose appears to be to transparently perform retries in the case that
> ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event.
>
> The trouble is that it's possible that a write which received a
> ConnectionLoss
> error as a return value actually succeeded.  Once ZooKeeperClient retries,
> it'll
> get back NodeExists and propagate that error to the caller, even though the
> write succeeded and the node in fact did not exist.
>
> It seems as though the same issue would hold for setData and delete calls
> which
> use the version argument -- you could get a spurious BadVersion.
>
> I believe I've reproduced the variant with a spurious NodeExists.  It
> manifested as a suprious BKLedgerExistException when running against a 3
> instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> storage
> to force Disconnect events by injecting write delays.  This seems to make
> sense
> as AbstractZkLedgerManager.createLedgerMetadata uses
> ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> appears
> to depend on the create atomicity to avoid two writers overwriting each
> other's
> nodes.
>
> AbstractZkLedgerManager.writeLedger would seem to have the same problem
> with
> its dependence on using setData with the version argument to avoid lost
> updates.
>
> Am I missing something in this analysis?  It seems to me that behavior
> could
> be actually problematic during periods of spotty connectivity to the
> ZooKeeper cluster.
>
> Thanks!
> -Sam
>