You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by oo4load <c....@gmail.com> on 2018/06/18 15:04:48 UTC

dynamic config file number

I had a problem getting dynamic reconfig to work on new / clean clusters, if
I copied the zoo.cfg and zoo.cfg.dynamic.(number) file over from an older
installation.


Here's what happens:

[zk: localhost:2181(CONNECTED) 2] config
server.1=srv5703h:2888:3888:participant;0.0.0.0:2181
server.2=srv5703k:2888:3888:participant;0.0.0.0:2181
server.3=srv5704y:2888:3888:participant;0.0.0.0:2181
version=1f001cc8d5

[zk: localhost:2181(CONNECTED) 3] reconfig -remove 3
Committed new configuration:
server.1=srv5703h:2888:3888:participant;0.0.0.0:2181
server.2=srv5703k:2888:3888:participant;0.0.0.0:2181
server.3=srv5704y:2888:3888:participant;0.0.0.0:2181
version=1f001cc8d5


As you can see the config version doesnt change.
If you check the filesystem, on each Zookeeper a ".next" file is created
with the new config, but it seems like it's never committed. 

-rw-r-----. 1 prof prof 282 Jun 18 12:39 zoo.cfg
-rw-r-----. 1 prof prof 159 Jun 18 15:25 zoo.cfg.dynamic.1f001cc8d5
-rw-r-----. 1 prof prof 123 Jun 18 15:26 zoo.cfg.dynamic.next


On the Zookeepers where the reconfig command was NOT run, the logs show the
following message:
2018-06-18 15:26:56,491 [myid:3] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@476] - Incremental reconfig
2018-06-18 15:26:56,493 [myid:3] - ERROR [ProcessThread(sid:3
cport:-1)::QuorumPeer@1460] - setLastSeenQuorumVerifier called with stale
config 4294967306. Current version: 133145872597


After growing a ton of grey hairs we figured out that a new cluster must
start with an "unnumbered" dynamic config file, and copying over an existing
config always fails. Can anyone explain why that is ?

Thanks,

Chris



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: dynamic config file number

Posted by Alexander Shraer <sh...@gmail.com>.
The way it was implemented, is that the version (which is printed in your
log, like version=1f001cc8d5) is not stored in the
dynamic config file, but is actually part of its file name. It corresponds
to the zxid at which the configuration was committed.
You should never change that manually, or copy it from a different cluster.
Instead you should either start with a static config file
which will then be automatically converted to a dynamic one, or with an
un-numbered dynamic one, as you suggest.
https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#sc_reconfig_file

I don't remember exactly, but I'm guessing that when a server boots, it
uses the version in the file name to bootstrap its config info.
Then, when you reconfig, the zxid of the reconfig (which is also the
version of the new config) is lower than the config version your cluster
has (probably the new cluster committed less ops than the previous one, so
its zxid is smaller)
so it fails with an error that the config is stale (has lower zxid /
version than the one the server already has).


Alex

On Mon, Jun 18, 2018 at 8:04 AM, oo4load <c....@gmail.com> wrote:

> I had a problem getting dynamic reconfig to work on new / clean clusters,
> if
> I copied the zoo.cfg and zoo.cfg.dynamic.(number) file over from an older
> installation.
>
>
> Here's what happens:
>
> [zk: localhost:2181(CONNECTED) 2] config
> server.1=srv5703h:2888:3888:participant;0.0.0.0:2181
> server.2=srv5703k:2888:3888:participant;0.0.0.0:2181
> server.3=srv5704y:2888:3888:participant;0.0.0.0:2181
> version=1f001cc8d5
>
> [zk: localhost:2181(CONNECTED) 3] reconfig -remove 3
> Committed new configuration:
> server.1=srv5703h:2888:3888:participant;0.0.0.0:2181
> server.2=srv5703k:2888:3888:participant;0.0.0.0:2181
> server.3=srv5704y:2888:3888:participant;0.0.0.0:2181
> version=1f001cc8d5
>
>
> As you can see the config version doesnt change.
> If you check the filesystem, on each Zookeeper a ".next" file is created
> with the new config, but it seems like it's never committed.
>
> -rw-r-----. 1 prof prof 282 Jun 18 12:39 zoo.cfg
> -rw-r-----. 1 prof prof 159 Jun 18 15:25 zoo.cfg.dynamic.1f001cc8d5
> -rw-r-----. 1 prof prof 123 Jun 18 15:26 zoo.cfg.dynamic.next
>
>
> On the Zookeepers where the reconfig command was NOT run, the logs show the
> following message:
> 2018-06-18 15:26:56,491 [myid:3] - INFO  [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@476] - Incremental reconfig
> 2018-06-18 15:26:56,493 [myid:3] - ERROR [ProcessThread(sid:3
> cport:-1)::QuorumPeer@1460] - setLastSeenQuorumVerifier called with stale
> config 4294967306. Current version: 133145872597
>
>
> After growing a ton of grey hairs we figured out that a new cluster must
> start with an "unnumbered" dynamic config file, and copying over an
> existing
> config always fails. Can anyone explain why that is ?
>
> Thanks,
>
> Chris
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>