You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Patrick Hunt <ph...@apache.org> on 2010/10/06 22:45:53 UTC

Re: znode inconsistencies across ZooKeeper servers

Vishal the attachment seems to be getting removed by the list daemon (I
don't have it), can you create a JIRA and attach? Also this is a good
question for the ppl on zookeeper-user. (ccing)

You are aware that ephemeral znodes are tied to the session? And that
sessions only expire after the session timeout period? At which time any
znodes created during that session are then deleted. The fact that you are
"kill"ing your client process leads me to believe that you are not closing
the session cleanly (meaning that it will eventually expire after the
session timeout period), in which case the ephemeral znodes _should_
reappear when A is restarted and successfully rejoins the cluster. (at least
until the session timeout is exceeded)

Patrick

On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:

> Hi,
>
> I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
> have a ZK client running that connects to the local server and creates an
> ephemeral znode to indicate clients on other nodes that it is online.
>
> I have test script that reboots the zookeeper server as well as client on
> A. The test does a getstat on the ephemeral znode created by the client on
> A. I am seeing that the view of znodes on A is different from the other 2
> nodes. I can tell this from the session ID that the client gets after
> reconnecting to the local ZK server.
>
> So the test is simple:
> - kill zookeeper server and client process
> - wait for a few seconds
> - do zkCli.sh stat ... > test.out
>
> What I am seeing is that the ephemeral znode with old zxid, time, and
> session ID is reappearing on node A. I have attached the output of 3
> consecutive getstat requests of the test (see client_getstat.out). Notice
> that the third output is the same as the first one. That is, the old
> ephemeral znode reappeared at A. However, both B and C are showing the
> latest znode with correct time, zxid and session ID (output not attached).
>
> After this point, all following getstat requests on A are showing the old
> znode. Whereas, B and C show the correct znode every time the client on A
> comes online. This is something very perplexing. Earlier I thought this was
> a bug in my client implementation. But the test shows that the ZK server on
> A after reboot is out of sync with rest of the servers.
>
> The stat command to each server shows that the servers are in sync as far
> as zxid's are concerned (see stat.out). So there is something wrong with A's
> local database that is causing this problem.
>
> Has anyone seen this before? I will be doing more debugging in the next few
> days. Comments/suggestions for further debugging are welcomed.
>
> -Vishal
>
>
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Vishal K <vi...@gmail.com>.
Sure,  I will reproduce it with debug enabled and create a JIRA. Thanks.

On Thu, Oct 7, 2010 at 12:23 PM, Patrick Hunt <ph...@apache.org> wrote:

> Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
> description, your configuration files from all servers, and the log files
> from all servers during the time of the incident? If you could run the
> servers in DEBUG level logging during the time you reproduce the issue that
> would probably help:
> https://issues.apache.org/jira/browse/ZOOKEEPER
>
> Thanks!
>
> Patrick
>
>
> On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vi...@gmail.com> wrote:
>
>> Hi Patrick,
>>
>> You are correct, the test restarts both ZooKeeper server and the client.
>> The
>> client opens a new connection after restarting. So we would expect that
>> the
>> ephmeral znode (/foo) to expire after the session timeout. However, the
>> client with the new session creates the ephemeral znode (/foo) again after
>> it reboots (it sets a watch for /foo and recreates /foo if it is deleted
>> or
>> doesn't exist). The client is not reusing the session ID. What I expect to
>> see is that the older /foo should expire after which a new /foo should get
>> created. Is my expectation correct?
>>
>> What confuses me is the following output of 3 successive getstat /foo
>> requests on A (the zxid, time and owner fields).  Notice that the older
>> znode reappeared.
>> At the same time when I do getstat at B and C, I see the newer /foo.
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x100000607
>> ctime = Tue Oct 05 15:01:07 UTC 2010
>> mZxid = 0x100000607
>> mtime = Tue Oct 05 15:01:07 UTC 2010
>> pZxid = 0x100000607
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce5bda40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> Thanks for your help.
>>
>> -Vishal
>>
>> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>> > Vishal the attachment seems to be getting removed by the list daemon (I
>> > don't have it), can you create a JIRA and attach? Also this is a good
>> > question for the ppl on zookeeper-user. (ccing)
>> >
>> > You are aware that ephemeral znodes are tied to the session? And that
>> > sessions only expire after the session timeout period? At which time any
>> > znodes created during that session are then deleted. The fact that you
>> are
>> > "kill"ing your client process leads me to believe that you are not
>> closing
>> > the session cleanly (meaning that it will eventually expire after the
>> > session timeout period), in which case the ephemeral znodes _should_
>> > reappear when A is restarted and successfully rejoins the cluster. (at
>> > least
>> > until the session timeout is exceeded)
>> >
>> > Patrick
>> >
>> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node
>> A), I
>> > > have a ZK client running that connects to the local server and creates
>> an
>> > > ephemeral znode to indicate clients on other nodes that it is online.
>> > >
>> > > I have test script that reboots the zookeeper server as well as client
>> on
>> > > A. The test does a getstat on the ephemeral znode created by the
>> client
>> > on
>> > > A. I am seeing that the view of znodes on A is different from the
>> other 2
>> > > nodes. I can tell this from the session ID that the client gets after
>> > > reconnecting to the local ZK server.
>> > >
>> > > So the test is simple:
>> > > - kill zookeeper server and client process
>> > > - wait for a few seconds
>> > > - do zkCli.sh stat ... > test.out
>> > >
>> > > What I am seeing is that the ephemeral znode with old zxid, time, and
>> > > session ID is reappearing on node A. I have attached the output of 3
>> > > consecutive getstat requests of the test (see client_getstat.out).
>> Notice
>> > > that the third output is the same as the first one. That is, the old
>> > > ephemeral znode reappeared at A. However, both B and C are showing the
>> > > latest znode with correct time, zxid and session ID (output not
>> > attached).
>> > >
>> > > After this point, all following getstat requests on A are showing the
>> old
>> > > znode. Whereas, B and C show the correct znode every time the client
>> on A
>> > > comes online. This is something very perplexing. Earlier I thought
>> this
>> > was
>> > > a bug in my client implementation. But the test shows that the ZK
>> server
>> > on
>> > > A after reboot is out of sync with rest of the servers.
>> > >
>> > > The stat command to each server shows that the servers are in sync as
>> far
>> > > as zxid's are concerned (see stat.out). So there is something wrong
>> with
>> > A's
>> > > local database that is causing this problem.
>> > >
>> > > Has anyone seen this before? I will be doing more debugging in the
>> next
>> > few
>> > > days. Comments/suggestions for further debugging are welcomed.
>> > >
>> > > -Vishal
>> > >
>> > >
>> > >
>> >
>>
>
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Vishal K <vi...@gmail.com>.
Sure,  I will reproduce it with debug enabled and create a JIRA. Thanks.

On Thu, Oct 7, 2010 at 12:23 PM, Patrick Hunt <ph...@apache.org> wrote:

> Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
> description, your configuration files from all servers, and the log files
> from all servers during the time of the incident? If you could run the
> servers in DEBUG level logging during the time you reproduce the issue that
> would probably help:
> https://issues.apache.org/jira/browse/ZOOKEEPER
>
> Thanks!
>
> Patrick
>
>
> On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vi...@gmail.com> wrote:
>
>> Hi Patrick,
>>
>> You are correct, the test restarts both ZooKeeper server and the client.
>> The
>> client opens a new connection after restarting. So we would expect that
>> the
>> ephmeral znode (/foo) to expire after the session timeout. However, the
>> client with the new session creates the ephemeral znode (/foo) again after
>> it reboots (it sets a watch for /foo and recreates /foo if it is deleted
>> or
>> doesn't exist). The client is not reusing the session ID. What I expect to
>> see is that the older /foo should expire after which a new /foo should get
>> created. Is my expectation correct?
>>
>> What confuses me is the following output of 3 successive getstat /foo
>> requests on A (the zxid, time and owner fields).  Notice that the older
>> znode reappeared.
>> At the same time when I do getstat at B and C, I see the newer /foo.
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x100000607
>> ctime = Tue Oct 05 15:01:07 UTC 2010
>> mZxid = 0x100000607
>> mtime = Tue Oct 05 15:01:07 UTC 2010
>> pZxid = 0x100000607
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce5bda40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> Thanks for your help.
>>
>> -Vishal
>>
>> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>> > Vishal the attachment seems to be getting removed by the list daemon (I
>> > don't have it), can you create a JIRA and attach? Also this is a good
>> > question for the ppl on zookeeper-user. (ccing)
>> >
>> > You are aware that ephemeral znodes are tied to the session? And that
>> > sessions only expire after the session timeout period? At which time any
>> > znodes created during that session are then deleted. The fact that you
>> are
>> > "kill"ing your client process leads me to believe that you are not
>> closing
>> > the session cleanly (meaning that it will eventually expire after the
>> > session timeout period), in which case the ephemeral znodes _should_
>> > reappear when A is restarted and successfully rejoins the cluster. (at
>> > least
>> > until the session timeout is exceeded)
>> >
>> > Patrick
>> >
>> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node
>> A), I
>> > > have a ZK client running that connects to the local server and creates
>> an
>> > > ephemeral znode to indicate clients on other nodes that it is online.
>> > >
>> > > I have test script that reboots the zookeeper server as well as client
>> on
>> > > A. The test does a getstat on the ephemeral znode created by the
>> client
>> > on
>> > > A. I am seeing that the view of znodes on A is different from the
>> other 2
>> > > nodes. I can tell this from the session ID that the client gets after
>> > > reconnecting to the local ZK server.
>> > >
>> > > So the test is simple:
>> > > - kill zookeeper server and client process
>> > > - wait for a few seconds
>> > > - do zkCli.sh stat ... > test.out
>> > >
>> > > What I am seeing is that the ephemeral znode with old zxid, time, and
>> > > session ID is reappearing on node A. I have attached the output of 3
>> > > consecutive getstat requests of the test (see client_getstat.out).
>> Notice
>> > > that the third output is the same as the first one. That is, the old
>> > > ephemeral znode reappeared at A. However, both B and C are showing the
>> > > latest znode with correct time, zxid and session ID (output not
>> > attached).
>> > >
>> > > After this point, all following getstat requests on A are showing the
>> old
>> > > znode. Whereas, B and C show the correct znode every time the client
>> on A
>> > > comes online. This is something very perplexing. Earlier I thought
>> this
>> > was
>> > > a bug in my client implementation. But the test shows that the ZK
>> server
>> > on
>> > > A after reboot is out of sync with rest of the servers.
>> > >
>> > > The stat command to each server shows that the servers are in sync as
>> far
>> > > as zxid's are concerned (see stat.out). So there is something wrong
>> with
>> > A's
>> > > local database that is causing this problem.
>> > >
>> > > Has anyone seen this before? I will be doing more debugging in the
>> next
>> > few
>> > > days. Comments/suggestions for further debugging are welcomed.
>> > >
>> > > -Vishal
>> > >
>> > >
>> > >
>> >
>>
>
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Patrick Hunt <ph...@apache.org>.
Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
description, your configuration files from all servers, and the log files
from all servers during the time of the incident? If you could run the
servers in DEBUG level logging during the time you reproduce the issue that
would probably help:
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks!

Patrick

On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vi...@gmail.com> wrote:

> Hi Patrick,
>
> You are correct, the test restarts both ZooKeeper server and the client.
> The
> client opens a new connection after restarting. So we would expect that the
> ephmeral znode (/foo) to expire after the session timeout. However, the
> client with the new session creates the ephemeral znode (/foo) again after
> it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
> doesn't exist). The client is not reusing the session ID. What I expect to
> see is that the older /foo should expire after which a new /foo should get
> created. Is my expectation correct?
>
> What confuses me is the following output of 3 successive getstat /foo
> requests on A (the zxid, time and owner fields).  Notice that the older
> znode reappeared.
> At the same time when I do getstat at B and C, I see the newer /foo.
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x1000005ef
> ctime = Tue Oct 05 15:00:50 UTC 2010
> mZxid = 0x1000005ef
> mtime = Tue Oct 05 15:00:50 UTC 2010
> pZxid = 0x1000005ef
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce57ce40000
> dataLength = 54
> numChildren = 0
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x100000607
> ctime = Tue Oct 05 15:01:07 UTC 2010
> mZxid = 0x100000607
> mtime = Tue Oct 05 15:01:07 UTC 2010
> pZxid = 0x100000607
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce5bda40000
> dataLength = 54
> numChildren = 0
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x1000005ef
> ctime = Tue Oct 05 15:00:50 UTC 2010
> mZxid = 0x1000005ef
> mtime = Tue Oct 05 15:00:50 UTC 2010
> pZxid = 0x1000005ef
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce57ce40000
> dataLength = 54
> numChildren = 0
>
> Thanks for your help.
>
> -Vishal
>
> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:
>
> > Vishal the attachment seems to be getting removed by the list daemon (I
> > don't have it), can you create a JIRA and attach? Also this is a good
> > question for the ppl on zookeeper-user. (ccing)
> >
> > You are aware that ephemeral znodes are tied to the session? And that
> > sessions only expire after the session timeout period? At which time any
> > znodes created during that session are then deleted. The fact that you
> are
> > "kill"ing your client process leads me to believe that you are not
> closing
> > the session cleanly (meaning that it will eventually expire after the
> > session timeout period), in which case the ephemeral znodes _should_
> > reappear when A is restarted and successfully rejoins the cluster. (at
> > least
> > until the session timeout is exceeded)
> >
> > Patrick
> >
> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A),
> I
> > > have a ZK client running that connects to the local server and creates
> an
> > > ephemeral znode to indicate clients on other nodes that it is online.
> > >
> > > I have test script that reboots the zookeeper server as well as client
> on
> > > A. The test does a getstat on the ephemeral znode created by the client
> > on
> > > A. I am seeing that the view of znodes on A is different from the other
> 2
> > > nodes. I can tell this from the session ID that the client gets after
> > > reconnecting to the local ZK server.
> > >
> > > So the test is simple:
> > > - kill zookeeper server and client process
> > > - wait for a few seconds
> > > - do zkCli.sh stat ... > test.out
> > >
> > > What I am seeing is that the ephemeral znode with old zxid, time, and
> > > session ID is reappearing on node A. I have attached the output of 3
> > > consecutive getstat requests of the test (see client_getstat.out).
> Notice
> > > that the third output is the same as the first one. That is, the old
> > > ephemeral znode reappeared at A. However, both B and C are showing the
> > > latest znode with correct time, zxid and session ID (output not
> > attached).
> > >
> > > After this point, all following getstat requests on A are showing the
> old
> > > znode. Whereas, B and C show the correct znode every time the client on
> A
> > > comes online. This is something very perplexing. Earlier I thought this
> > was
> > > a bug in my client implementation. But the test shows that the ZK
> server
> > on
> > > A after reboot is out of sync with rest of the servers.
> > >
> > > The stat command to each server shows that the servers are in sync as
> far
> > > as zxid's are concerned (see stat.out). So there is something wrong
> with
> > A's
> > > local database that is causing this problem.
> > >
> > > Has anyone seen this before? I will be doing more debugging in the next
> > few
> > > days. Comments/suggestions for further debugging are welcomed.
> > >
> > > -Vishal
> > >
> > >
> > >
> >
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Patrick Hunt <ph...@apache.org>.
Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
description, your configuration files from all servers, and the log files
from all servers during the time of the incident? If you could run the
servers in DEBUG level logging during the time you reproduce the issue that
would probably help:
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks!

Patrick

On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vi...@gmail.com> wrote:

> Hi Patrick,
>
> You are correct, the test restarts both ZooKeeper server and the client.
> The
> client opens a new connection after restarting. So we would expect that the
> ephmeral znode (/foo) to expire after the session timeout. However, the
> client with the new session creates the ephemeral znode (/foo) again after
> it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
> doesn't exist). The client is not reusing the session ID. What I expect to
> see is that the older /foo should expire after which a new /foo should get
> created. Is my expectation correct?
>
> What confuses me is the following output of 3 successive getstat /foo
> requests on A (the zxid, time and owner fields).  Notice that the older
> znode reappeared.
> At the same time when I do getstat at B and C, I see the newer /foo.
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x1000005ef
> ctime = Tue Oct 05 15:00:50 UTC 2010
> mZxid = 0x1000005ef
> mtime = Tue Oct 05 15:00:50 UTC 2010
> pZxid = 0x1000005ef
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce57ce40000
> dataLength = 54
> numChildren = 0
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x100000607
> ctime = Tue Oct 05 15:01:07 UTC 2010
> mZxid = 0x100000607
> mtime = Tue Oct 05 15:01:07 UTC 2010
> pZxid = 0x100000607
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce5bda40000
> dataLength = 54
> numChildren = 0
>
> log4j:WARN No appenders could be found for logger
> (org.apache.zookeeper.ZooKeeper).
> log4j:WARN Please initialize the log4j system properly.
> cZxid = 0x1000005ef
> ctime = Tue Oct 05 15:00:50 UTC 2010
> mZxid = 0x1000005ef
> mtime = Tue Oct 05 15:00:50 UTC 2010
> pZxid = 0x1000005ef
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x2b7ce57ce40000
> dataLength = 54
> numChildren = 0
>
> Thanks for your help.
>
> -Vishal
>
> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:
>
> > Vishal the attachment seems to be getting removed by the list daemon (I
> > don't have it), can you create a JIRA and attach? Also this is a good
> > question for the ppl on zookeeper-user. (ccing)
> >
> > You are aware that ephemeral znodes are tied to the session? And that
> > sessions only expire after the session timeout period? At which time any
> > znodes created during that session are then deleted. The fact that you
> are
> > "kill"ing your client process leads me to believe that you are not
> closing
> > the session cleanly (meaning that it will eventually expire after the
> > session timeout period), in which case the ephemeral znodes _should_
> > reappear when A is restarted and successfully rejoins the cluster. (at
> > least
> > until the session timeout is exceeded)
> >
> > Patrick
> >
> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A),
> I
> > > have a ZK client running that connects to the local server and creates
> an
> > > ephemeral znode to indicate clients on other nodes that it is online.
> > >
> > > I have test script that reboots the zookeeper server as well as client
> on
> > > A. The test does a getstat on the ephemeral znode created by the client
> > on
> > > A. I am seeing that the view of znodes on A is different from the other
> 2
> > > nodes. I can tell this from the session ID that the client gets after
> > > reconnecting to the local ZK server.
> > >
> > > So the test is simple:
> > > - kill zookeeper server and client process
> > > - wait for a few seconds
> > > - do zkCli.sh stat ... > test.out
> > >
> > > What I am seeing is that the ephemeral znode with old zxid, time, and
> > > session ID is reappearing on node A. I have attached the output of 3
> > > consecutive getstat requests of the test (see client_getstat.out).
> Notice
> > > that the third output is the same as the first one. That is, the old
> > > ephemeral znode reappeared at A. However, both B and C are showing the
> > > latest znode with correct time, zxid and session ID (output not
> > attached).
> > >
> > > After this point, all following getstat requests on A are showing the
> old
> > > znode. Whereas, B and C show the correct znode every time the client on
> A
> > > comes online. This is something very perplexing. Earlier I thought this
> > was
> > > a bug in my client implementation. But the test shows that the ZK
> server
> > on
> > > A after reboot is out of sync with rest of the servers.
> > >
> > > The stat command to each server shows that the servers are in sync as
> far
> > > as zxid's are concerned (see stat.out). So there is something wrong
> with
> > A's
> > > local database that is causing this problem.
> > >
> > > Has anyone seen this before? I will be doing more debugging in the next
> > few
> > > days. Comments/suggestions for further debugging are welcomed.
> > >
> > > -Vishal
> > >
> > >
> > >
> >
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Vishal K <vi...@gmail.com>.
Hi Patrick,

You are correct, the test restarts both ZooKeeper server and the client. The
client opens a new connection after restarting. So we would expect that the
ephmeral znode (/foo) to expire after the session timeout. However, the
client with the new session creates the ephemeral znode (/foo) again after
it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
doesn't exist). The client is not reusing the session ID. What I expect to
see is that the older /foo should expire after which a new /foo should get
created. Is my expectation correct?

What confuses me is the following output of 3 successive getstat /foo
requests on A (the zxid, time and owner fields).  Notice that the older
znode reappeared.
At the same time when I do getstat at B and C, I see the newer /foo.

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x100000607
ctime = Tue Oct 05 15:01:07 UTC 2010
mZxid = 0x100000607
mtime = Tue Oct 05 15:01:07 UTC 2010
pZxid = 0x100000607
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce5bda40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

Thanks for your help.

-Vishal

On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:

> Vishal the attachment seems to be getting removed by the list daemon (I
> don't have it), can you create a JIRA and attach? Also this is a good
> question for the ppl on zookeeper-user. (ccing)
>
> You are aware that ephemeral znodes are tied to the session? And that
> sessions only expire after the session timeout period? At which time any
> znodes created during that session are then deleted. The fact that you are
> "kill"ing your client process leads me to believe that you are not closing
> the session cleanly (meaning that it will eventually expire after the
> session timeout period), in which case the ephemeral znodes _should_
> reappear when A is restarted and successfully rejoins the cluster. (at
> least
> until the session timeout is exceeded)
>
> Patrick
>
> On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
> > have a ZK client running that connects to the local server and creates an
> > ephemeral znode to indicate clients on other nodes that it is online.
> >
> > I have test script that reboots the zookeeper server as well as client on
> > A. The test does a getstat on the ephemeral znode created by the client
> on
> > A. I am seeing that the view of znodes on A is different from the other 2
> > nodes. I can tell this from the session ID that the client gets after
> > reconnecting to the local ZK server.
> >
> > So the test is simple:
> > - kill zookeeper server and client process
> > - wait for a few seconds
> > - do zkCli.sh stat ... > test.out
> >
> > What I am seeing is that the ephemeral znode with old zxid, time, and
> > session ID is reappearing on node A. I have attached the output of 3
> > consecutive getstat requests of the test (see client_getstat.out). Notice
> > that the third output is the same as the first one. That is, the old
> > ephemeral znode reappeared at A. However, both B and C are showing the
> > latest znode with correct time, zxid and session ID (output not
> attached).
> >
> > After this point, all following getstat requests on A are showing the old
> > znode. Whereas, B and C show the correct znode every time the client on A
> > comes online. This is something very perplexing. Earlier I thought this
> was
> > a bug in my client implementation. But the test shows that the ZK server
> on
> > A after reboot is out of sync with rest of the servers.
> >
> > The stat command to each server shows that the servers are in sync as far
> > as zxid's are concerned (see stat.out). So there is something wrong with
> A's
> > local database that is causing this problem.
> >
> > Has anyone seen this before? I will be doing more debugging in the next
> few
> > days. Comments/suggestions for further debugging are welcomed.
> >
> > -Vishal
> >
> >
> >
>

Re: znode inconsistencies across ZooKeeper servers

Posted by Vishal K <vi...@gmail.com>.
Hi Patrick,

You are correct, the test restarts both ZooKeeper server and the client. The
client opens a new connection after restarting. So we would expect that the
ephmeral znode (/foo) to expire after the session timeout. However, the
client with the new session creates the ephemeral znode (/foo) again after
it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
doesn't exist). The client is not reusing the session ID. What I expect to
see is that the older /foo should expire after which a new /foo should get
created. Is my expectation correct?

What confuses me is the following output of 3 successive getstat /foo
requests on A (the zxid, time and owner fields).  Notice that the older
znode reappeared.
At the same time when I do getstat at B and C, I see the newer /foo.

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x100000607
ctime = Tue Oct 05 15:01:07 UTC 2010
mZxid = 0x100000607
mtime = Tue Oct 05 15:01:07 UTC 2010
pZxid = 0x100000607
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce5bda40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

Thanks for your help.

-Vishal

On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:

> Vishal the attachment seems to be getting removed by the list daemon (I
> don't have it), can you create a JIRA and attach? Also this is a good
> question for the ppl on zookeeper-user. (ccing)
>
> You are aware that ephemeral znodes are tied to the session? And that
> sessions only expire after the session timeout period? At which time any
> znodes created during that session are then deleted. The fact that you are
> "kill"ing your client process leads me to believe that you are not closing
> the session cleanly (meaning that it will eventually expire after the
> session timeout period), in which case the ephemeral znodes _should_
> reappear when A is restarted and successfully rejoins the cluster. (at
> least
> until the session timeout is exceeded)
>
> Patrick
>
> On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vi...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
> > have a ZK client running that connects to the local server and creates an
> > ephemeral znode to indicate clients on other nodes that it is online.
> >
> > I have test script that reboots the zookeeper server as well as client on
> > A. The test does a getstat on the ephemeral znode created by the client
> on
> > A. I am seeing that the view of znodes on A is different from the other 2
> > nodes. I can tell this from the session ID that the client gets after
> > reconnecting to the local ZK server.
> >
> > So the test is simple:
> > - kill zookeeper server and client process
> > - wait for a few seconds
> > - do zkCli.sh stat ... > test.out
> >
> > What I am seeing is that the ephemeral znode with old zxid, time, and
> > session ID is reappearing on node A. I have attached the output of 3
> > consecutive getstat requests of the test (see client_getstat.out). Notice
> > that the third output is the same as the first one. That is, the old
> > ephemeral znode reappeared at A. However, both B and C are showing the
> > latest znode with correct time, zxid and session ID (output not
> attached).
> >
> > After this point, all following getstat requests on A are showing the old
> > znode. Whereas, B and C show the correct znode every time the client on A
> > comes online. This is something very perplexing. Earlier I thought this
> was
> > a bug in my client implementation. But the test shows that the ZK server
> on
> > A after reboot is out of sync with rest of the servers.
> >
> > The stat command to each server shows that the servers are in sync as far
> > as zxid's are concerned (see stat.out). So there is something wrong with
> A's
> > local database that is causing this problem.
> >
> > Has anyone seen this before? I will be doing more debugging in the next
> few
> > days. Comments/suggestions for further debugging are welcomed.
> >
> > -Vishal
> >
> >
> >
>