You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Ishaaq Chandy <is...@gmail.com> on 2011/10/08 03:47:45 UTC

puzzling BadVersionException

Hi all,

We're seeing a puzzling error. Here's the scenario:

1. We have a single thread that wakes up every two seconds (give or take)
and does some work
2. As part of that work it updates a node on ZK. When it does this it first
gets the Stat of the existing node and uses the version retrieved from it to
update the value.
3. There are no other processes updating the node

The code goes something like this:
 final Stat stat = zooKeeper.exists(path, false);
// do some other work here to create the path if it does not exist - this
code only ever gets called once
 zooKeeper.setData(path, value, stat.getVersion());

What we're seeing is that every so often (once every 5 minutes or so?) is
that that setData() call fails with a BadVersionException. This is very
unexpected because, as I mentioned previously, this thread is the sole
updater of that node.

One possibility I am considering is that we are using the wrong number of
ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst number of
nodes possible for ZK as there is no way to resolve a disagreement.

Another possibility is that we are using an old version of ZK (3.2.2),
perhaps there is a known bug with it? Though I see nothing related to this
in the release logs for subsequent versions.

Thoughts/suggestions?

Thanks,
Ishaaq

Re: puzzling BadVersionException

Posted by Ted Dunning <te...@gmail.com>.
Sounds like you may want to look into the multi operation if you have many
inputs being processed to form a single output.

On Tue, Oct 11, 2011 at 5:55 AM, Ishaaq Chandy <is...@gmail.com> wrote:

> Ok, false alarm - the problem was a mis-configuration in our code that was
> causing multiple processes to update that znode whereas only one should
> have.
>
> Apologies for wasting your time.
>
> Ishaaq
>
> On 11 October 2011 13:09, Ishaaq Chandy <is...@gmail.com> wrote:
>
> > Technically we don't need the contents as we're going to overwrite it
> > anyway, we're just asserting the fact that we're the only one writing to
> > that node.
> >
> > Was just checking if it is a known issue - clearly not, so I'll continue
> > investigating our code.
> >
> > Thanks,
> > Ishaaq
> >
> >
> > On 11 October 2011 12:21, Ted Dunning <te...@gmail.com> wrote:
> >
> >> Why do you get the version in the first place without getting the
> >> contents?
> >>
> >> If you don't have the contents, what is the point of enforcing a
> version.
> >>
> >> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <is...@gmail.com>
> wrote:
> >>
> >> > Thanks Mahadev,
> >> > Yup, I am aware of the fact that 2 is a particularly bad number for
> >> cluster
> >> > size and hopefully we should fix that soon, I was just hoping that for
> >> some
> >> > reason that was why the problem is occurring - my conjecture was, for
> >> e.g.
> >> > if the two zk servers disagree about the version there is no way to
> >> decide
> >> > who is correct without a third tie-breaker server.
> >> >
> >> > But, if you say that is not the case, then I need to keep looking
> >> (sigh).
> >> >
> >> > I am pretty sure that only one thread is touching that znode. We put
> in
> >> > some
> >> > trace logging to try and pinpoint the problem and noticed that every
> >> time
> >> > we
> >> > get the BadVersionException the actual version on the znode is one
> more
> >> > than
> >> > what we expected it to be based on the previous "exists()" call.
> >> >
> >> > As I said, this code gets called once every 2 seconds (or
> thereabouts).
> >> It
> >> > seems to fail with a BadVersionException about 3 times an hour (on
> >> > average).
> >> >
> >> > By the way, not sure if it is relevant, but the reason we are using 2
> >> nodes
> >> > in the cluster and the reason why their version is 3.2.2 is because
> they
> >> > are
> >> > the ZKs that come embedded inside HBase (we're running 2 Hbase
> >> > regionservers) - I've been meaning to pull them out and run them
> >> standalone
> >> > but just haven't got around to it (yet).
> >> >
> >> > Ishaaq
> >> >
> >> > On 10 October 2011 17:35, Mahadev Konar <ma...@hortonworks.com>
> >> wrote:
> >> >
> >> > > Ishaaq,
> >> > >  2 ZK servers is definitely not the right number for running a ZK
> >> > > service but its no reason to get a Badversion exception because of
> >> > > that. For more information on the size of the ZK ensemble take a
> look
> >> > > at:
> >> > >
> >> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
> >> > >
> >> > > As for the version on the znode, can you try reading the version
> when
> >> > > you get a setData/BadException?
> >> > >
> >> > > Also, is there any chance of a delete on the znode that removes it
> and
> >> > > another create happens for the same path?
> >> > >
> >> > > I dont think we have seen this version issue in the releases, so I'd
> >> > > be inclined to say that there could be something in the code thats
> >> > > making some changes to the znode before you set the data.
> >> > >
> >> > > Hope that helps
> >> > > thanks
> >> > > mahadev
> >> > >
> >> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com>
> >> wrote:
> >> > > > Hi all,
> >> > > >
> >> > > > We're seeing a puzzling error. Here's the scenario:
> >> > > >
> >> > > > 1. We have a single thread that wakes up every two seconds (give
> or
> >> > take)
> >> > > > and does some work
> >> > > > 2. As part of that work it updates a node on ZK. When it does this
> >> it
> >> > > first
> >> > > > gets the Stat of the existing node and uses the version retrieved
> >> from
> >> > it
> >> > > to
> >> > > > update the value.
> >> > > > 3. There are no other processes updating the node
> >> > > >
> >> > > > The code goes something like this:
> >> > > >  final Stat stat = zooKeeper.exists(path, false);
> >> > > > // do some other work here to create the path if it does not exist
> -
> >> > this
> >> > > > code only ever gets called once
> >> > > >  zooKeeper.setData(path, value, stat.getVersion());
> >> > > >
> >> > > > What we're seeing is that every so often (once every 5 minutes or
> >> so?)
> >> > is
> >> > > > that that setData() call fails with a BadVersionException. This is
> >> very
> >> > > > unexpected because, as I mentioned previously, this thread is the
> >> sole
> >> > > > updater of that node.
> >> > > >
> >> > > > One possibility I am considering is that we are using the wrong
> >> number
> >> > of
> >> > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst
> >> > number
> >> > > of
> >> > > > nodes possible for ZK as there is no way to resolve a
> disagreement.
> >> > > >
> >> > > > Another possibility is that we are using an old version of ZK
> >> (3.2.2),
> >> > > > perhaps there is a known bug with it? Though I see nothing related
> >> to
> >> > > this
> >> > > > in the release logs for subsequent versions.
> >> > > >
> >> > > > Thoughts/suggestions?
> >> > > >
> >> > > > Thanks,
> >> > > > Ishaaq
> >>
> >
>

Re: puzzling BadVersionException

Posted by Ishaaq Chandy <is...@gmail.com>.
Ok, false alarm - the problem was a mis-configuration in our code that was
causing multiple processes to update that znode whereas only one should
have.

Apologies for wasting your time.

Ishaaq

On 11 October 2011 13:09, Ishaaq Chandy <is...@gmail.com> wrote:

> Technically we don't need the contents as we're going to overwrite it
> anyway, we're just asserting the fact that we're the only one writing to
> that node.
>
> Was just checking if it is a known issue - clearly not, so I'll continue
> investigating our code.
>
> Thanks,
> Ishaaq
>
>
> On 11 October 2011 12:21, Ted Dunning <te...@gmail.com> wrote:
>
>> Why do you get the version in the first place without getting the
>> contents?
>>
>> If you don't have the contents, what is the point of enforcing a version.
>>
>> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <is...@gmail.com> wrote:
>>
>> > Thanks Mahadev,
>> > Yup, I am aware of the fact that 2 is a particularly bad number for
>> cluster
>> > size and hopefully we should fix that soon, I was just hoping that for
>> some
>> > reason that was why the problem is occurring - my conjecture was, for
>> e.g.
>> > if the two zk servers disagree about the version there is no way to
>> decide
>> > who is correct without a third tie-breaker server.
>> >
>> > But, if you say that is not the case, then I need to keep looking
>> (sigh).
>> >
>> > I am pretty sure that only one thread is touching that znode. We put in
>> > some
>> > trace logging to try and pinpoint the problem and noticed that every
>> time
>> > we
>> > get the BadVersionException the actual version on the znode is one more
>> > than
>> > what we expected it to be based on the previous "exists()" call.
>> >
>> > As I said, this code gets called once every 2 seconds (or thereabouts).
>> It
>> > seems to fail with a BadVersionException about 3 times an hour (on
>> > average).
>> >
>> > By the way, not sure if it is relevant, but the reason we are using 2
>> nodes
>> > in the cluster and the reason why their version is 3.2.2 is because they
>> > are
>> > the ZKs that come embedded inside HBase (we're running 2 Hbase
>> > regionservers) - I've been meaning to pull them out and run them
>> standalone
>> > but just haven't got around to it (yet).
>> >
>> > Ishaaq
>> >
>> > On 10 October 2011 17:35, Mahadev Konar <ma...@hortonworks.com>
>> wrote:
>> >
>> > > Ishaaq,
>> > >  2 ZK servers is definitely not the right number for running a ZK
>> > > service but its no reason to get a Badversion exception because of
>> > > that. For more information on the size of the ZK ensemble take a look
>> > > at:
>> > >
>> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
>> > >
>> > > As for the version on the znode, can you try reading the version when
>> > > you get a setData/BadException?
>> > >
>> > > Also, is there any chance of a delete on the znode that removes it and
>> > > another create happens for the same path?
>> > >
>> > > I dont think we have seen this version issue in the releases, so I'd
>> > > be inclined to say that there could be something in the code thats
>> > > making some changes to the znode before you set the data.
>> > >
>> > > Hope that helps
>> > > thanks
>> > > mahadev
>> > >
>> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com>
>> wrote:
>> > > > Hi all,
>> > > >
>> > > > We're seeing a puzzling error. Here's the scenario:
>> > > >
>> > > > 1. We have a single thread that wakes up every two seconds (give or
>> > take)
>> > > > and does some work
>> > > > 2. As part of that work it updates a node on ZK. When it does this
>> it
>> > > first
>> > > > gets the Stat of the existing node and uses the version retrieved
>> from
>> > it
>> > > to
>> > > > update the value.
>> > > > 3. There are no other processes updating the node
>> > > >
>> > > > The code goes something like this:
>> > > >  final Stat stat = zooKeeper.exists(path, false);
>> > > > // do some other work here to create the path if it does not exist -
>> > this
>> > > > code only ever gets called once
>> > > >  zooKeeper.setData(path, value, stat.getVersion());
>> > > >
>> > > > What we're seeing is that every so often (once every 5 minutes or
>> so?)
>> > is
>> > > > that that setData() call fails with a BadVersionException. This is
>> very
>> > > > unexpected because, as I mentioned previously, this thread is the
>> sole
>> > > > updater of that node.
>> > > >
>> > > > One possibility I am considering is that we are using the wrong
>> number
>> > of
>> > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst
>> > number
>> > > of
>> > > > nodes possible for ZK as there is no way to resolve a disagreement.
>> > > >
>> > > > Another possibility is that we are using an old version of ZK
>> (3.2.2),
>> > > > perhaps there is a known bug with it? Though I see nothing related
>> to
>> > > this
>> > > > in the release logs for subsequent versions.
>> > > >
>> > > > Thoughts/suggestions?
>> > > >
>> > > > Thanks,
>> > > > Ishaaq
>>
>

Re: puzzling BadVersionException

Posted by Ishaaq Chandy <is...@gmail.com>.
Technically we don't need the contents as we're going to overwrite it
anyway, we're just asserting the fact that we're the only one writing to
that node.

Was just checking if it is a known issue - clearly not, so I'll continue
investigating our code.

Thanks,
Ishaaq

On 11 October 2011 12:21, Ted Dunning <te...@gmail.com> wrote:

> Why do you get the version in the first place without getting the contents?
>
> If you don't have the contents, what is the point of enforcing a version.
>
> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <is...@gmail.com> wrote:
>
> > Thanks Mahadev,
> > Yup, I am aware of the fact that 2 is a particularly bad number for
> cluster
> > size and hopefully we should fix that soon, I was just hoping that for
> some
> > reason that was why the problem is occurring - my conjecture was, for
> e.g.
> > if the two zk servers disagree about the version there is no way to
> decide
> > who is correct without a third tie-breaker server.
> >
> > But, if you say that is not the case, then I need to keep looking (sigh).
> >
> > I am pretty sure that only one thread is touching that znode. We put in
> > some
> > trace logging to try and pinpoint the problem and noticed that every time
> > we
> > get the BadVersionException the actual version on the znode is one more
> > than
> > what we expected it to be based on the previous "exists()" call.
> >
> > As I said, this code gets called once every 2 seconds (or thereabouts).
> It
> > seems to fail with a BadVersionException about 3 times an hour (on
> > average).
> >
> > By the way, not sure if it is relevant, but the reason we are using 2
> nodes
> > in the cluster and the reason why their version is 3.2.2 is because they
> > are
> > the ZKs that come embedded inside HBase (we're running 2 Hbase
> > regionservers) - I've been meaning to pull them out and run them
> standalone
> > but just haven't got around to it (yet).
> >
> > Ishaaq
> >
> > On 10 October 2011 17:35, Mahadev Konar <ma...@hortonworks.com> wrote:
> >
> > > Ishaaq,
> > >  2 ZK servers is definitely not the right number for running a ZK
> > > service but its no reason to get a Badversion exception because of
> > > that. For more information on the size of the ZK ensemble take a look
> > > at:
> > >
> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
> > >
> > > As for the version on the znode, can you try reading the version when
> > > you get a setData/BadException?
> > >
> > > Also, is there any chance of a delete on the znode that removes it and
> > > another create happens for the same path?
> > >
> > > I dont think we have seen this version issue in the releases, so I'd
> > > be inclined to say that there could be something in the code thats
> > > making some changes to the znode before you set the data.
> > >
> > > Hope that helps
> > > thanks
> > > mahadev
> > >
> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com>
> wrote:
> > > > Hi all,
> > > >
> > > > We're seeing a puzzling error. Here's the scenario:
> > > >
> > > > 1. We have a single thread that wakes up every two seconds (give or
> > take)
> > > > and does some work
> > > > 2. As part of that work it updates a node on ZK. When it does this it
> > > first
> > > > gets the Stat of the existing node and uses the version retrieved
> from
> > it
> > > to
> > > > update the value.
> > > > 3. There are no other processes updating the node
> > > >
> > > > The code goes something like this:
> > > >  final Stat stat = zooKeeper.exists(path, false);
> > > > // do some other work here to create the path if it does not exist -
> > this
> > > > code only ever gets called once
> > > >  zooKeeper.setData(path, value, stat.getVersion());
> > > >
> > > > What we're seeing is that every so often (once every 5 minutes or
> so?)
> > is
> > > > that that setData() call fails with a BadVersionException. This is
> very
> > > > unexpected because, as I mentioned previously, this thread is the
> sole
> > > > updater of that node.
> > > >
> > > > One possibility I am considering is that we are using the wrong
> number
> > of
> > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst
> > number
> > > of
> > > > nodes possible for ZK as there is no way to resolve a disagreement.
> > > >
> > > > Another possibility is that we are using an old version of ZK
> (3.2.2),
> > > > perhaps there is a known bug with it? Though I see nothing related to
> > > this
> > > > in the release logs for subsequent versions.
> > > >
> > > > Thoughts/suggestions?
> > > >
> > > > Thanks,
> > > > Ishaaq
> > > >
> > >
> >
>

Re: puzzling BadVersionException

Posted by Ted Dunning <te...@gmail.com>.
Why do you get the version in the first place without getting the contents?

If you don't have the contents, what is the point of enforcing a version.

On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <is...@gmail.com> wrote:

> Thanks Mahadev,
> Yup, I am aware of the fact that 2 is a particularly bad number for cluster
> size and hopefully we should fix that soon, I was just hoping that for some
> reason that was why the problem is occurring - my conjecture was, for e.g.
> if the two zk servers disagree about the version there is no way to decide
> who is correct without a third tie-breaker server.
>
> But, if you say that is not the case, then I need to keep looking (sigh).
>
> I am pretty sure that only one thread is touching that znode. We put in
> some
> trace logging to try and pinpoint the problem and noticed that every time
> we
> get the BadVersionException the actual version on the znode is one more
> than
> what we expected it to be based on the previous "exists()" call.
>
> As I said, this code gets called once every 2 seconds (or thereabouts). It
> seems to fail with a BadVersionException about 3 times an hour (on
> average).
>
> By the way, not sure if it is relevant, but the reason we are using 2 nodes
> in the cluster and the reason why their version is 3.2.2 is because they
> are
> the ZKs that come embedded inside HBase (we're running 2 Hbase
> regionservers) - I've been meaning to pull them out and run them standalone
> but just haven't got around to it (yet).
>
> Ishaaq
>
> On 10 October 2011 17:35, Mahadev Konar <ma...@hortonworks.com> wrote:
>
> > Ishaaq,
> >  2 ZK servers is definitely not the right number for running a ZK
> > service but its no reason to get a Badversion exception because of
> > that. For more information on the size of the ZK ensemble take a look
> > at:
> >
> > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
> >
> > As for the version on the znode, can you try reading the version when
> > you get a setData/BadException?
> >
> > Also, is there any chance of a delete on the znode that removes it and
> > another create happens for the same path?
> >
> > I dont think we have seen this version issue in the releases, so I'd
> > be inclined to say that there could be something in the code thats
> > making some changes to the znode before you set the data.
> >
> > Hope that helps
> > thanks
> > mahadev
> >
> > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com> wrote:
> > > Hi all,
> > >
> > > We're seeing a puzzling error. Here's the scenario:
> > >
> > > 1. We have a single thread that wakes up every two seconds (give or
> take)
> > > and does some work
> > > 2. As part of that work it updates a node on ZK. When it does this it
> > first
> > > gets the Stat of the existing node and uses the version retrieved from
> it
> > to
> > > update the value.
> > > 3. There are no other processes updating the node
> > >
> > > The code goes something like this:
> > >  final Stat stat = zooKeeper.exists(path, false);
> > > // do some other work here to create the path if it does not exist -
> this
> > > code only ever gets called once
> > >  zooKeeper.setData(path, value, stat.getVersion());
> > >
> > > What we're seeing is that every so often (once every 5 minutes or so?)
> is
> > > that that setData() call fails with a BadVersionException. This is very
> > > unexpected because, as I mentioned previously, this thread is the sole
> > > updater of that node.
> > >
> > > One possibility I am considering is that we are using the wrong number
> of
> > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst
> number
> > of
> > > nodes possible for ZK as there is no way to resolve a disagreement.
> > >
> > > Another possibility is that we are using an old version of ZK (3.2.2),
> > > perhaps there is a known bug with it? Though I see nothing related to
> > this
> > > in the release logs for subsequent versions.
> > >
> > > Thoughts/suggestions?
> > >
> > > Thanks,
> > > Ishaaq
> > >
> >
>

Re: puzzling BadVersionException

Posted by Ishaaq Chandy <is...@gmail.com>.
Thanks Mahadev,
Yup, I am aware of the fact that 2 is a particularly bad number for cluster
size and hopefully we should fix that soon, I was just hoping that for some
reason that was why the problem is occurring - my conjecture was, for e.g.
if the two zk servers disagree about the version there is no way to decide
who is correct without a third tie-breaker server.

But, if you say that is not the case, then I need to keep looking (sigh).

I am pretty sure that only one thread is touching that znode. We put in some
trace logging to try and pinpoint the problem and noticed that every time we
get the BadVersionException the actual version on the znode is one more than
what we expected it to be based on the previous "exists()" call.

As I said, this code gets called once every 2 seconds (or thereabouts). It
seems to fail with a BadVersionException about 3 times an hour (on average).

By the way, not sure if it is relevant, but the reason we are using 2 nodes
in the cluster and the reason why their version is 3.2.2 is because they are
the ZKs that come embedded inside HBase (we're running 2 Hbase
regionservers) - I've been meaning to pull them out and run them standalone
but just haven't got around to it (yet).

Ishaaq

On 10 October 2011 17:35, Mahadev Konar <ma...@hortonworks.com> wrote:

> Ishaaq,
>  2 ZK servers is definitely not the right number for running a ZK
> service but its no reason to get a Badversion exception because of
> that. For more information on the size of the ZK ensemble take a look
> at:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
>
> As for the version on the znode, can you try reading the version when
> you get a setData/BadException?
>
> Also, is there any chance of a delete on the znode that removes it and
> another create happens for the same path?
>
> I dont think we have seen this version issue in the releases, so I'd
> be inclined to say that there could be something in the code thats
> making some changes to the znode before you set the data.
>
> Hope that helps
> thanks
> mahadev
>
> On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com> wrote:
> > Hi all,
> >
> > We're seeing a puzzling error. Here's the scenario:
> >
> > 1. We have a single thread that wakes up every two seconds (give or take)
> > and does some work
> > 2. As part of that work it updates a node on ZK. When it does this it
> first
> > gets the Stat of the existing node and uses the version retrieved from it
> to
> > update the value.
> > 3. There are no other processes updating the node
> >
> > The code goes something like this:
> >  final Stat stat = zooKeeper.exists(path, false);
> > // do some other work here to create the path if it does not exist - this
> > code only ever gets called once
> >  zooKeeper.setData(path, value, stat.getVersion());
> >
> > What we're seeing is that every so often (once every 5 minutes or so?) is
> > that that setData() call fails with a BadVersionException. This is very
> > unexpected because, as I mentioned previously, this thread is the sole
> > updater of that node.
> >
> > One possibility I am considering is that we are using the wrong number of
> > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst number
> of
> > nodes possible for ZK as there is no way to resolve a disagreement.
> >
> > Another possibility is that we are using an old version of ZK (3.2.2),
> > perhaps there is a known bug with it? Though I see nothing related to
> this
> > in the release logs for subsequent versions.
> >
> > Thoughts/suggestions?
> >
> > Thanks,
> > Ishaaq
> >
>

Re: puzzling BadVersionException

Posted by Mahadev Konar <ma...@hortonworks.com>.
Ishaaq,
 2 ZK servers is definitely not the right number for running a ZK
service but its no reason to get a Badversion exception because of
that. For more information on the size of the ZK ensemble take a look
at:

http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html

As for the version on the znode, can you try reading the version when
you get a setData/BadException?

Also, is there any chance of a delete on the znode that removes it and
another create happens for the same path?

I dont think we have seen this version issue in the releases, so I'd
be inclined to say that there could be something in the code thats
making some changes to the znode before you set the data.

Hope that helps
thanks
mahadev

On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <is...@gmail.com> wrote:
> Hi all,
>
> We're seeing a puzzling error. Here's the scenario:
>
> 1. We have a single thread that wakes up every two seconds (give or take)
> and does some work
> 2. As part of that work it updates a node on ZK. When it does this it first
> gets the Stat of the existing node and uses the version retrieved from it to
> update the value.
> 3. There are no other processes updating the node
>
> The code goes something like this:
>  final Stat stat = zooKeeper.exists(path, false);
> // do some other work here to create the path if it does not exist - this
> code only ever gets called once
>  zooKeeper.setData(path, value, stat.getVersion());
>
> What we're seeing is that every so often (once every 5 minutes or so?) is
> that that setData() call fails with a BadVersionException. This is very
> unexpected because, as I mentioned previously, this thread is the sole
> updater of that node.
>
> One possibility I am considering is that we are using the wrong number of
> ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst number of
> nodes possible for ZK as there is no way to resolve a disagreement.
>
> Another possibility is that we are using an old version of ZK (3.2.2),
> perhaps there is a known bug with it? Though I see nothing related to this
> in the release logs for subsequent versions.
>
> Thoughts/suggestions?
>
> Thanks,
> Ishaaq
>