You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by tison <wa...@gmail.com> on 2022/06/16 13:02:01 UTC

Distinguish BadVersion a real exception or false positive for retry

Hi ZooKeepers,

When investigate this issue[1] I notice a possibility that if I write such
a program:

while (true) {
try {
  zk.setData(path, data, version); // (1)
  break;
} catch(KeeperException.ConnectionLossException e) {
   // retry
}
}

... then in (1) there can be a case that it throws
KeeperException.BadVersionException but actually the action is successful
on the server side previously (but failed to send a response due to
connection loss).

Is this investigation right? If so, it is possibly to distinguish whether
we succeed to apply the op on the server side?

Best,
tison.

[1] https://github.com/apache/pulsar/issues/13954

Re: Distinguish BadVersion a real exception or false positive for retry

Posted by tison <wa...@gmail.com>.
Thanks. I shall go through the application to see how the local view build
and whether I can handle such case correctly - if there are concurrent
updates, it seems impossible to figure out whether the operation ever
succeeded.

Enrico Olivelli <eo...@gmail.com>于2022年6月16日 周四21:57写道:

> Il giorno gio 16 giu 2022 alle ore 15:35 tison <wa...@gmail.com>
> ha scritto:
> >
> > > if you see BadVersionException then the action must not have been
> >
> > From ZK code it seems that the comparison is about equality.
> >
> > Is it possible that:
> >
> > T0: setData(path, data, v0)
> > T1: ConnectionLoss, but setData succeeded on the server, and thus version
> > changed
> > T2: client received ConnectionLoss, retry
> > T3: client got BadVersionException, but actually the data is changed as
> it
> > proposed
>
> In any case you cannot rely on the status of the znode in case of
> ConnectionLoss,
> the fact that the change reached the server or not is not a big deal.
> you have to revalidate your local view of the data.
>
> in this case BadVersion and ConnectionLoss mean that you are not up-to-date
>
> Enrico
>
>
> >
> > > you have to read the znode again and compare the version,
> >
> > Yeah..This is still a best effort check. If there are multiple writers
> even
> > I get the version of znode and it's mismatched I don't know whether it
> ever
> > succeed.
> >
> > Best,
> > tison.
> >
> >
> > Enrico Olivelli <eo...@gmail.com> 于2022年6月16日周四 21:08写道:
> >
> > > Tison,
> > >
> > > Il giorno gio 16 giu 2022 alle ore 15:04 tison <wa...@gmail.com>
> > > ha scritto:
> > > >
> > > > Hi ZooKeepers,
> > > >
> > > > When investigate this issue[1] I notice a possibility that if I write
> > > such
> > > > a program:
> > > >
> > > > while (true) {
> > > > try {
> > > >   zk.setData(path, data, version); // (1)
> > > >   break;
> > > > } catch(KeeperException.ConnectionLossException e) {
> > > >    // retry
> > > > }
> > > > }
> > > >
> > > > ... then in (1) there can be a case that it throws
> > > > KeeperException.BadVersionException but actually the action is
> successful
> > > > on the server side previously (but failed to send a response due to
> > > > connection loss).
> > >
> > > if you see BadVersionException then the action must not have been
> > > successful on the server
> > > if you see ConnectionLossException then you know nothing about the
> outcome
> > >
> > > >
> > > > Is this investigation right? If so, it is possibly to distinguish
> whether
> > > > we succeed to apply the op on the server side?
> > >
> > > you have to read the znode again and compare the version,
> > > but please remember that in the meantime (after your read response
> > > leaves the server) someone could have change the znode
> > >
> > > I hope that helps
> > >
> > > Enrico
> > >
> > >
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > > [1] https://github.com/apache/pulsar/issues/13954
> > >
>
-- 
Best,
tison.

Re: Distinguish BadVersion a real exception or false positive for retry

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno gio 16 giu 2022 alle ore 15:35 tison <wa...@gmail.com>
ha scritto:
>
> > if you see BadVersionException then the action must not have been
>
> From ZK code it seems that the comparison is about equality.
>
> Is it possible that:
>
> T0: setData(path, data, v0)
> T1: ConnectionLoss, but setData succeeded on the server, and thus version
> changed
> T2: client received ConnectionLoss, retry
> T3: client got BadVersionException, but actually the data is changed as it
> proposed

In any case you cannot rely on the status of the znode in case of
ConnectionLoss,
the fact that the change reached the server or not is not a big deal.
you have to revalidate your local view of the data.

in this case BadVersion and ConnectionLoss mean that you are not up-to-date

Enrico


>
> > you have to read the znode again and compare the version,
>
> Yeah..This is still a best effort check. If there are multiple writers even
> I get the version of znode and it's mismatched I don't know whether it ever
> succeed.
>
> Best,
> tison.
>
>
> Enrico Olivelli <eo...@gmail.com> 于2022年6月16日周四 21:08写道:
>
> > Tison,
> >
> > Il giorno gio 16 giu 2022 alle ore 15:04 tison <wa...@gmail.com>
> > ha scritto:
> > >
> > > Hi ZooKeepers,
> > >
> > > When investigate this issue[1] I notice a possibility that if I write
> > such
> > > a program:
> > >
> > > while (true) {
> > > try {
> > >   zk.setData(path, data, version); // (1)
> > >   break;
> > > } catch(KeeperException.ConnectionLossException e) {
> > >    // retry
> > > }
> > > }
> > >
> > > ... then in (1) there can be a case that it throws
> > > KeeperException.BadVersionException but actually the action is successful
> > > on the server side previously (but failed to send a response due to
> > > connection loss).
> >
> > if you see BadVersionException then the action must not have been
> > successful on the server
> > if you see ConnectionLossException then you know nothing about the outcome
> >
> > >
> > > Is this investigation right? If so, it is possibly to distinguish whether
> > > we succeed to apply the op on the server side?
> >
> > you have to read the znode again and compare the version,
> > but please remember that in the meantime (after your read response
> > leaves the server) someone could have change the znode
> >
> > I hope that helps
> >
> > Enrico
> >
> >
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://github.com/apache/pulsar/issues/13954
> >

Re: Distinguish BadVersion a real exception or false positive for retry

Posted by tison <wa...@gmail.com>.
> if you see BadVersionException then the action must not have been

From ZK code it seems that the comparison is about equality.

Is it possible that:

T0: setData(path, data, v0)
T1: ConnectionLoss, but setData succeeded on the server, and thus version
changed
T2: client received ConnectionLoss, retry
T3: client got BadVersionException, but actually the data is changed as it
proposed

> you have to read the znode again and compare the version,

Yeah..This is still a best effort check. If there are multiple writers even
I get the version of znode and it's mismatched I don't know whether it ever
succeed.

Best,
tison.


Enrico Olivelli <eo...@gmail.com> 于2022年6月16日周四 21:08写道:

> Tison,
>
> Il giorno gio 16 giu 2022 alle ore 15:04 tison <wa...@gmail.com>
> ha scritto:
> >
> > Hi ZooKeepers,
> >
> > When investigate this issue[1] I notice a possibility that if I write
> such
> > a program:
> >
> > while (true) {
> > try {
> >   zk.setData(path, data, version); // (1)
> >   break;
> > } catch(KeeperException.ConnectionLossException e) {
> >    // retry
> > }
> > }
> >
> > ... then in (1) there can be a case that it throws
> > KeeperException.BadVersionException but actually the action is successful
> > on the server side previously (but failed to send a response due to
> > connection loss).
>
> if you see BadVersionException then the action must not have been
> successful on the server
> if you see ConnectionLossException then you know nothing about the outcome
>
> >
> > Is this investigation right? If so, it is possibly to distinguish whether
> > we succeed to apply the op on the server side?
>
> you have to read the znode again and compare the version,
> but please remember that in the meantime (after your read response
> leaves the server) someone could have change the znode
>
> I hope that helps
>
> Enrico
>
>
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/apache/pulsar/issues/13954
>

Re: Distinguish BadVersion a real exception or false positive for retry

Posted by Enrico Olivelli <eo...@gmail.com>.
Tison,

Il giorno gio 16 giu 2022 alle ore 15:04 tison <wa...@gmail.com>
ha scritto:
>
> Hi ZooKeepers,
>
> When investigate this issue[1] I notice a possibility that if I write such
> a program:
>
> while (true) {
> try {
>   zk.setData(path, data, version); // (1)
>   break;
> } catch(KeeperException.ConnectionLossException e) {
>    // retry
> }
> }
>
> ... then in (1) there can be a case that it throws
> KeeperException.BadVersionException but actually the action is successful
> on the server side previously (but failed to send a response due to
> connection loss).

if you see BadVersionException then the action must not have been
successful on the server
if you see ConnectionLossException then you know nothing about the outcome

>
> Is this investigation right? If so, it is possibly to distinguish whether
> we succeed to apply the op on the server side?

you have to read the znode again and compare the version,
but please remember that in the meantime (after your read response
leaves the server) someone could have change the znode

I hope that helps

Enrico


>
> Best,
> tison.
>
> [1] https://github.com/apache/pulsar/issues/13954