You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Dr Hao He <he...@softtouchit.com> on 2010/08/11 08:32:30 UTC

How to handle "Node does not exist" error?

hi, All,

I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts, there are a number of nodes that I can "get" and "ls" using zkCli.sh .  However, when I tried to "delete" any of them, I got "Node does not exist" error.    Those nodes do not exist on the other two hosts. 

Any idea how we should handle this type of errors and what might have caused this problem?

Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile


Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
What do your nodes  have in their logs during startup?   Are you sure  
you have them configured correctly?  Are the file ephemeral? Could  
they have disappeared on their own?

Sent from my iPhone

On Aug 11, 2010, at 12:10 AM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted,
>
> Thanks for the reply.  Here is what I did:
>
> [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> []
> zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,  
> msg0000002704, msg0000002706, msg0000002601, msg0000001849,  
> msg0000001847, msg0000002508, msg0000002609, msg0000001841,  
> msg0000002607, msg0000002606, msg0000002604, msg0000002809,  
> msg0000002817, msg0000001633, msg0000002812, msg0000002814,  
> msg0000002711, msg0000002815, msg0000002713, msg0000002716,  
> msg0000001772, msg0000002811, msg0000001635, msg0000001774,  
> msg0000002515, msg0000002610, msg0000001838, msg0000002517,  
> msg0000002612, msg0000002519, msg0000001973, msg0000001835,  
> msg0000001974, msg0000002619, msg0000001831, msg0000002510,  
> msg0000002512, msg0000002615, msg0000002614, msg0000002617,  
> msg0000002104, msg0000002106, msg0000001769, msg0000001768,  
> msg0000002828, msg0000002822, msg0000001760, msg0000002820,  
> msg0000001963, msg0000001961, msg0000002110, msg0000002118,  
> msg0000002900, msg0000002836, msg0000001757, msg0000002907,  
> msg0000001753, msg0000001752, msg0000001755, msg0000001952,  
> msg0000001958, msg0000001852, msg0000001956, msg0000001854,  
> msg0000002749, msg0000001608, msg0000001609, msg0000002747,  
> msg0000002882, msg0000001743, msg0000002888, msg0000001605,  
> msg0000002885, msg0000001487, msg0000001746, msg0000002330,  
> msg0000001749, msg0000001488, msg0000001489, msg0000001881,  
> msg0000001491, msg0000002890, msg0000001889, msg0000002758,  
> msg0000002241, msg0000002892, msg0000002852, msg0000002759,  
> msg0000002898, msg0000002850, msg0000001733, msg0000002751,  
> msg0000001739, msg0000002753, msg0000002756, msg0000002332,  
> msg0000001872, msg0000002233, msg0000001721, msg0000001627,  
> msg0000001720, msg0000001625, msg0000001628, msg0000001629,  
> msg0000001729, msg0000002350, msg0000001727, msg0000002352,  
> msg0000001622, msg0000001726, msg0000001623, msg0000001723,  
> msg0000001724, msg0000001621, msg0000002736, msg0000002738,  
> msg0000002363, msg0000001717, msg0000002878, msg0000002362,  
> msg0000002361, msg0000001611, msg0000001894, msg0000002357,  
> msg0000002218, msg0000002358, msg0000002355, msg0000001895,  
> msg0000002356, msg0000001898, msg0000002354, msg0000001996,  
> msg0000001990, msg0000002093, msg0000002880, msg0000002576,  
> msg0000002579, msg0000002267, msg0000002266, msg0000002366,  
> msg0000001901, msg0000002365, msg0000001903, msg0000001799,  
> msg0000001906, msg0000002368, msg0000001597, msg0000002679,  
> msg0000002166, msg0000001595, msg0000002481, msg0000002482,  
> msg0000002373, msg0000002374, msg0000002371, msg0000001599,  
> msg0000002773, msg0000002274, msg0000002275, msg0000002270,  
> msg0000002583, msg0000002271, msg0000002580, msg0000002067,  
> msg0000002277, msg0000002278, msg0000002376, msg0000002180,  
> msg0000002467, msg0000002378, msg0000002182, msg0000002377,  
> msg0000002184, msg0000002379, msg0000002187, msg0000002186,  
> msg0000002665, msg0000002666, msg0000002381, msg0000002382,  
> msg0000002661, msg0000002662, msg0000002663, msg0000002385,  
> msg0000002284, msg0000002766, msg0000002282, msg0000002190,  
> msg0000002599, msg0000002054, msg0000002596, msg0000002453,  
> msg0000002459, msg0000002457, msg0000002456, msg0000002191,  
> msg0000002652, msg0000002395, msg0000002650, msg0000002656,  
> msg0000002655, msg0000002189, msg0000002047, msg0000002658,  
> msg0000002659, msg0000002796, msg0000002250, msg0000002255,  
> msg0000002589, msg0000002257, msg0000002061, msg0000002064,  
> msg0000002585, msg0000002258, msg0000002587, msg0000002444,  
> msg0000002446, msg0000002447, msg0000002450, msg0000002646,  
> msg0000001501, msg0000002591, msg0000002592, msg0000001503,  
> msg0000001506, msg0000002260, msg0000002594, msg0000002262,  
> msg0000002263, msg0000002264, msg0000002590, msg0000002132,  
> msg0000002130, msg0000002530, msg0000002931, msg0000001559,  
> msg0000001808, msg0000002024, msg0000001553, msg0000002939,  
> msg0000002937, msg0000001556, msg0000002935, msg0000002933,  
> msg0000002140, msg0000001937, msg0000002143, msg0000002520,  
> msg0000002522, msg0000002429, msg0000002524, msg0000002920,  
> msg0000002035, msg0000001561, msg0000002134, msg0000002138,  
> msg0000002925, msg0000002151, msg0000002287, msg0000002555,  
> msg0000002010, msg0000002002, msg0000002290, msg0000001537,  
> msg0000002005, msg0000002147, msg0000002145, msg0000002698,  
> msg0000001592, msg0000001810, msg0000002690, msg0000002691,  
> msg0000001911, msg0000001910, msg0000002693, msg0000001812,  
> msg0000001817, msg0000001547, msg0000002012, msg0000002015,  
> msg0000002941, msg0000001688, msg0000002018, msg0000002684,  
> msg0000002944, msg0000001540, msg0000002686, msg0000001541,  
> msg0000002946, msg0000002688, msg0000001584, msg0000002948]
>
> [zk: localhost:2181(CONNECTED) 7] delete /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> Node does not exist: /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>
> When I performed the same operations on another node, none of those  
> nodes existed.
>
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
>
>> Can you provide some more information?  The output of some of the  
>> four
>> letter commands and a transcript of what you are doing would be very
>> helpful.
>>
>> Also, there is no way for znodes to exist on one node of a properly
>> operating ZK cluster and not on either of the other two.  Something  
>> has to
>> be wrong and I would vote for operator error (not to cast  
>> aspersions, it is
>> just that humans like you and *me* make more errors than ZK does).
>>
>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>  
>> wrote:
>>
>>> hi, All,
>>>
>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the  
>>> hosts,
>>> there are a number of nodes that I can "get" and "ls" using  
>>> zkCli.sh .
>>> However, when I tried to "delete" any of them, I got "Node does  
>>> not exist"
>>> error.    Those nodes do not exist on the other two hosts.
>>>
>>> Any idea how we should handle this type of errors and what might  
>>> have
>>> caused this problem?
>>>
>>> Dr Hao He
>>>
>>> XPE - the truly SOA platform
>>>
>>> he@softtouchit.com
>>> http://softtouchit.com
>>> http://itunes.com/apps/Scanmobile
>>>
>>>
>

Re: How to handle "Node does not exist" error?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
i thought there was a jira about supporting embedded zookeeper. (i 
remember rejecting a patch to fix it. one of the problems is that we 
have a couple of places that do System.exit().) i can't seem to find it 
though.

one case that would be great for embedding is writing test cases, so i 
think it would be useful for that.

ben

On 08/12/2010 03:25 PM, Ted Dunning wrote:
> I am not saying that the API shouldn't support embedded ZK.
>
> I am just saying that it is almost always a bad idea.  It isn't that I am
> asking you to not do it, it is just that I am describing the experience I
> have had and that I have seen others have.  In a nutshell, embedding leads
> to problems and it isn't hard to see why.
>
> On Thu, Aug 12, 2010 at 3:02 PM, Vishal K<vi...@gmail.com>  wrote:
>
>    
>> 2. With respect to Ted's point about backward compatibility, I would
>> suggest
>> to take an approach of having an API to support embedded ZK instead of
>> asking users to not embed ZK.
>>
>>      


Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.
Hi Dr Hao,

If you think this is not a configuration issue, then it would be a good idea
to open a jira. Thanks.

On Thu, Aug 12, 2010 at 8:42 PM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He <he...@softtouchit.com> wrote:
>
> > hi, Ted,
> >
> > I am a little bit confused here.  So, is the node inconsistency problem
> > that Vishal and I have seen here most likely caused by configurations or
> > embedding?
> >
> > If it is the former, I'd appreciate if you can point out where those
> silly
> > mistakes have been made and the correct way to embed ZK.
> >
>
> I think it is likely due to misconfiguration, but I don't know what the
> issue is exactly.  I think that another poster suggested that you ape the
> normal ZK startup process more closely.  That sounds good but it may be
> incompatible with your goals of integrating all configuration into a single
> XML file and not using the normal ZK configuration process.
>
> Your thought about forking ZK is a good one since there are calls to
> System.exit() that could wreak havoc.
>
>
>
> > Although I agree with your comments about the architectural issues that
> > embedding may lead to and we are aware of those,  I do not agree that
> > embedding will always lead to those issues.
>
>
> I agree that embedding won't always lead to those issues and your
> application is a reasonable counter-example.  As is common, I think that
> the
> exception proves the rule since your system is really just another way to
> launch an independent ZK cluster rather than an example of ZK being
> embedded
> into an application.
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.
In my case, I am pretty sure that the configuration was right. I will
reproduce it and post more info later. Thanks.

On Mon, Aug 16, 2010 at 1:08 PM, Patrick Hunt <ph...@apache.org> wrote:

> Try using the logs, stat command or JMX to verify that each ZK server is
> indeed a leader/follower as expected. You should have one leader and n-1
> followers. Verify that you don't have any "standalone" servers (this is the
> most frequent error I see - misconfiguration of a server such that it thinks
> it's a standalone server; I often see where a user has 3 standalone servers
> which they think is a single quorum, all of the servers will therefore be
> "inconsistent" to each other).
>
> Patrick
>
>
> On 08/12/2010 05:42 PM, Ted Dunning wrote:
>
>> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He<he...@softtouchit.com>  wrote:
>>
>>  hi, Ted,
>>>
>>> I am a little bit confused here.  So, is the node inconsistency problem
>>> that Vishal and I have seen here most likely caused by configurations or
>>> embedding?
>>>
>>> If it is the former, I'd appreciate if you can point out where those
>>> silly
>>> mistakes have been made and the correct way to embed ZK.
>>>
>>>
>> I think it is likely due to misconfiguration, but I don't know what the
>> issue is exactly.  I think that another poster suggested that you ape the
>> normal ZK startup process more closely.  That sounds good but it may be
>> incompatible with your goals of integrating all configuration into a
>> single
>> XML file and not using the normal ZK configuration process.
>>
>> Your thought about forking ZK is a good one since there are calls to
>> System.exit() that could wreak havoc.
>>
>>
>>
>>  Although I agree with your comments about the architectural issues that
>>> embedding may lead to and we are aware of those,  I do not agree that
>>> embedding will always lead to those issues.
>>>
>>
>>
>> I agree that embedding won't always lead to those issues and your
>> application is a reasonable counter-example.  As is common, I think that
>> the
>> exception proves the rule since your system is really just another way to
>> launch an independent ZK cluster rather than an example of ZK being
>> embedded
>> into an application.
>>
>>

Re: How to handle "Node does not exist" error?

Posted by Patrick Hunt <ph...@apache.org>.
Try using the logs, stat command or JMX to verify that each ZK server is 
indeed a leader/follower as expected. You should have one leader and n-1 
followers. Verify that you don't have any "standalone" servers (this is 
the most frequent error I see - misconfiguration of a server such that 
it thinks it's a standalone server; I often see where a user has 3 
standalone servers which they think is a single quorum, all of the 
servers will therefore be "inconsistent" to each other).

Patrick

On 08/12/2010 05:42 PM, Ted Dunning wrote:
> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He<he...@softtouchit.com>  wrote:
>
>> hi, Ted,
>>
>> I am a little bit confused here.  So, is the node inconsistency problem
>> that Vishal and I have seen here most likely caused by configurations or
>> embedding?
>>
>> If it is the former, I'd appreciate if you can point out where those silly
>> mistakes have been made and the correct way to embed ZK.
>>
>
> I think it is likely due to misconfiguration, but I don't know what the
> issue is exactly.  I think that another poster suggested that you ape the
> normal ZK startup process more closely.  That sounds good but it may be
> incompatible with your goals of integrating all configuration into a single
> XML file and not using the normal ZK configuration process.
>
> Your thought about forking ZK is a good one since there are calls to
> System.exit() that could wreak havoc.
>
>
>
>> Although I agree with your comments about the architectural issues that
>> embedding may lead to and we are aware of those,  I do not agree that
>> embedding will always lead to those issues.
>
>
> I agree that embedding won't always lead to those issues and your
> application is a reasonable counter-example.  As is common, I think that the
> exception proves the rule since your system is really just another way to
> launch an independent ZK cluster rather than an example of ZK being embedded
> into an application.
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted,
>
> I am a little bit confused here.  So, is the node inconsistency problem
> that Vishal and I have seen here most likely caused by configurations or
> embedding?
>
> If it is the former, I'd appreciate if you can point out where those silly
> mistakes have been made and the correct way to embed ZK.
>

I think it is likely due to misconfiguration, but I don't know what the
issue is exactly.  I think that another poster suggested that you ape the
normal ZK startup process more closely.  That sounds good but it may be
incompatible with your goals of integrating all configuration into a single
XML file and not using the normal ZK configuration process.

Your thought about forking ZK is a good one since there are calls to
System.exit() that could wreak havoc.



> Although I agree with your comments about the architectural issues that
> embedding may lead to and we are aware of those,  I do not agree that
> embedding will always lead to those issues.


I agree that embedding won't always lead to those issues and your
application is a reasonable counter-example.  As is common, I think that the
exception proves the rule since your system is really just another way to
launch an independent ZK cluster rather than an example of ZK being embedded
into an application.

Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.
hi, Ted,

I am a little bit confused here.  So, is the node inconsistency problem that Vishal and I have seen here most likely caused by configurations or embedding?

If it is the former, I'd appreciate if you can point out where those silly mistakes have been made and the correct way to embed ZK.

Although I agree with your comments about the architectural issues that embedding may lead to and we are aware of those,  I do not agree that embedding will always lead to those issues.  In our case, we have a very simple orchestration framework that starts various services according to our XML configuration file since we need to start them in the right sequence.  Architecturally, this is just like writing a set of unix scripts to start ZK and other services. The only difference is that we happen to implement it in Java and XML.  In short, services coordinated by ZK do not embed ZK, the top level orchestration layer does. The only potential problem I see here is that we are running all those services in one JVM but this can be easily changed.  We are going to try to fork out ZK in a separate JVM that and see if it makes any difference.  




Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 13/08/2010, at 8:25 AM, Ted Dunning wrote:

> I am not saying that the API shouldn't support embedded ZK.
> 
> I am just saying that it is almost always a bad idea.  It isn't that I am
> asking you to not do it, it is just that I am describing the experience I
> have had and that I have seen others have.  In a nutshell, embedding leads
> to problems and it isn't hard to see why.
> 
> On Thu, Aug 12, 2010 at 3:02 PM, Vishal K <vi...@gmail.com> wrote:
> 
>> 2. With respect to Ted's point about backward compatibility, I would
>> suggest
>> to take an approach of having an API to support embedded ZK instead of
>> asking users to not embed ZK.
>> 


Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
I am not saying that the API shouldn't support embedded ZK.

I am just saying that it is almost always a bad idea.  It isn't that I am
asking you to not do it, it is just that I am describing the experience I
have had and that I have seen others have.  In a nutshell, embedding leads
to problems and it isn't hard to see why.

On Thu, Aug 12, 2010 at 3:02 PM, Vishal K <vi...@gmail.com> wrote:

> 2. With respect to Ted's point about backward compatibility, I would
> suggest
> to take an approach of having an API to support embedded ZK instead of
> asking users to not embed ZK.
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.
Hi,

I don't intend to hijack Dr. Hao's email thread here, but I would like to
point out two things:

1. I  use embedded server as well. But I don't use any setters. We extend
QuorumPeerMain and call initializeAndRun() function. So we are doing pretty
much the same thing that QuorumPeerMain is doing. However, note that I am
seeing the same problem (in ZK 3.3.0) as Dr Hao is seeing. I haven't
debugged the cause yet. I assumed that this was my implementation error (and
it could still be). Nevertheless, this could turn out to be a bug as well.

2. With respect to Ted's point about backward compatibility, I would suggest
to take an approach of having an API to support embedded ZK instead of
asking users to not embed ZK.

-Vishal

On Thu, Aug 12, 2010 at 3:18 PM, Ted Dunning <te...@gmail.com> wrote:

> It doesn't.
>
> But running a ZK cluster that is incorrectly configured can cause this
> problem and configuring ZK using setters is likely to be subject to changes
> in what configuration is needed.  Thus, your style of code is more subject
> to decay over time than is nice.
>
> The rest of my comments detail *other* reasons why embedding a coordination
> layer in the code being coordinated is a bad idea.
>
> On Thu, Aug 12, 2010 at 6:33 AM, Vishal K <vi...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Can you explain why running ZK in embedded mode can cause znode
> > inconsistencies?
> > Thanks.
> >
> > -Vishal
> >
> > On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Try running the server in non-embedded mode.
> > >
> > > Also, you are assuming that you know everything about how to configure
> > the
> > > quorumPeer.  That is going to change and your code will break at that
> > time.
> > >  If you use a non-embedded cluster, this won't be a problem and you
> will
> > be
> > > able to upgrade ZK version without having to restart your service.
> > >
> > > My own opinion is that running an embedded ZK is a serious
> architectural
> > > error.  Since I don't know your particular situation, it might be
> > > different,
> > > but there is an inherent contradiction involved in running a
> coordination
> > > layer as part of the thing being coordinated.  Whatever your software
> > does,
> > > it isn't what ZK does.  As such, it is better to factor out the ZK
> > > functionality and make it completely stable.  That gives you a much
> > simpler
> > > world and will make it easier for you to trouble shoot your system.
>  The
> > > simple fact that you can't take down your service without affecting the
> > > reliability of your ZK layer makes this a very bad idea.
> > >
> > > The problems you are having now are only a preview of what this
> > > architectural error leads to.  There will be more problems and many of
> > them
> > > are likely to be more subtle and lead to service interruptions and lots
> > of
> > > wasted time.
> > >
> > > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
> > >
> > > > hi, Ted and Mahadev,
> > > >
> > > >
> > > > Here are some more details about my setup:
> > > >
> > > > I run zookeeper in the embedded mode with the following code:
> > > >
> > > >                                        quorumPeer = new QuorumPeer();
> > > >
> > > >  quorumPeer.setClientPort(getClientPort());
> > > >                                        quorumPeer.setTxnFactory(new
> > > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> > > >
> > > >  quorumPeer.setQuorumPeers(getServers());
> > > >
> > > >  quorumPeer.setElectionType(getElectionAlg());
> > > >
> >  quorumPeer.setMyid(getServerId());
> > > >
> > > >  quorumPeer.setTickTime(getTickTime());
> > > >
> > > >  quorumPeer.setInitLimit(getInitLimit());
> > > >
> > > >  quorumPeer.setSyncLimit(getSyncLimit());
> > > >
> > > >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> > > >
> > > >  quorumPeer.setCnxnFactory(cnxnFactory);
> > > >                                        quorumPeer.start();
> > > >
> > > >
> > > > The configuration values are read from the following XML document for
> > > > server 1:
> > > >
> > > > <cluster tickTime="1000" initLimit="10" syncLimit="5"
> clientPort="2181"
> > > > serverId="1">
> > > >                  <member id="1" host="192.168.2.6:2888:3888"/>
> > > >                  <member id="2" host="192.168.2.3:2888:3888"/>
> > > >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > > > </cluster>
> > > >
> > > >
> > > > The other servers have the same configurations except their ids being
> > > > changed to 2 and 3.
> > > >
> > > > The error occurred on server 3 when I batch loaded some messages to
> > > server
> > > > 1.  However, this error does not always happen.  I am not sure
> exactly
> > > what
> > > > trigged this error yet.
> > > >
> > > > I also performed the "stat" operation on one of the "No exit" node
> and
> > > got:
> > > >
> > > > stat
> > > >
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > > > Exception in thread "main" java.lang.NullPointerException
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > > > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> > > >
> > > >
> > > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL
> and
> > > are
> > > > deleted by the last server who has read them.
> > > >
> > > > If I remove the troubled server's zookeeper log directory and restart
> > the
> > > > server, then everything is ok.
> > > >
> > > > I will try to get the nc result next time I see this problem.
> > > >
> > > >
> > > > Dr Hao He
> > > >
> > > > XPE - the truly SOA platform
> > > >
> > > > he@softtouchit.com
> > > > http://softtouchit.com
> > > > http://itunes.com/apps/Scanmobile
> > > >
> > > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> > > >
> > > > > HI Dr Hao,
> > > > >  Can you please post the configuration of all the 3 zookeeper
> > servers?
> > > I
> > > > > suspect it might be misconfigured clusters and they might not
> belong
> > to
> > > > the
> > > > > same ensemble.
> > > > >
> > > > > Just to be clear:
> > > > >
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > > > >
> > > > > And other such nodes exist on one of the zookeeper servers and the
> > same
> > > > node
> > > > > does not exist on other servers?
> > > > >
> > > > > Also, as ted pointed out, can you please post the output of echo
> > ³stat²
> > > |
> > > > nc
> > > > > localhost 2181 (on all the 3 servers) to the list?
> > > > >
> > > > > Thanks
> > > > > mahadev
> > > > >
> > > > >
> > > > >
> > > > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > > > >
> > > > >> hi, Ted,
> > > > >>
> > > > >> Thanks for the reply.  Here is what I did:
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 0] ls
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> []
> > > > >> zk: localhost:2181(CONNECTED) 1] ls
> > > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > > > msg0000002704,
> > > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > > > msg0000002508,
> > > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > > > msg0000002604,
> > > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > > > msg0000002814,
> > > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > > > msg0000001772,
> > > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > > > msg0000002610,
> > > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > > > msg0000001973,
> > > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > > > msg0000002510,
> > > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > > > msg0000002104,
> > > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > > > msg0000002822,
> > > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > > > msg0000002110,
> > > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > > > msg0000002907,
> > > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > > > msg0000001958,
> > > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > > > msg0000001608,
> > > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > > > msg0000002888,
> > > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > > > msg0000002330,
> > > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > > > msg0000001491,
> > > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > > > msg0000002892,
> > > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > > > msg0000001733,
> > > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > > > msg0000002332,
> > > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > > > msg0000001720,
> > > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > > > msg0000002350,
> > > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > > > msg0000001623,
> > > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > > > msg0000002738,
> > > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > > > msg0000002361,
> > > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > > > msg0000002358,
> > > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > > > msg0000002354,
> > > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > > > msg0000002576,
> > > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > > > msg0000001901,
> > > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > > > msg0000002368,
> > > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > > > msg0000002481,
> > > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > > > msg0000001599,
> > > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > > > msg0000002583,
> > > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > > > msg0000002278,
> > > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > > > msg0000002182,
> > > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > > > msg0000002186,
> > > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > > > msg0000002661,
> > > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > > > msg0000002766,
> > > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > > > msg0000002596,
> > > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > > > msg0000002191,
> > > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > > > msg0000002655,
> > > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > > > msg0000002796,
> > > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > > > msg0000002061,
> > > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > > > msg0000002444,
> > > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > > > msg0000001501,
> > > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > > > msg0000002260,
> > > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > > > msg0000002590,
> > > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > > > msg0000001559,
> > > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > > > msg0000002937,
> > > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > > > msg0000001937,
> > > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > > > msg0000002524,
> > > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > > > msg0000002138,
> > > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > > > msg0000002010,
> > > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > > > msg0000002147,
> > > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > > > msg0000002690,
> > > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > > > msg0000001812,
> > > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > > > msg0000002941,
> > > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > > > msg0000001540,
> > > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > > > msg0000001584,
> > > > >> msg0000002948]
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 7] delete
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> Node does not exist:
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >>
> > > > >> When I performed the same operations on another node, none of
> those
> > > > nodes
> > > > >> existed.
> > > > >>
> > > > >>
> > > > >> Dr Hao He
> > > > >>
> > > > >> XPE - the truly SOA platform
> > > > >>
> > > > >> he@softtouchit.com
> > > > >> http://softtouchit.com
> > > > >> http://itunes.com/apps/Scanmobile
> > > > >>
> > > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > > > >>
> > > > >>> Can you provide some more information?  The output of some of the
> > > four
> > > > >>> letter commands and a transcript of what you are doing would be
> > very
> > > > >>> helpful.
> > > > >>>
> > > > >>> Also, there is no way for znodes to exist on one node of a
> properly
> > > > >>> operating ZK cluster and not on either of the other two.
>  Something
> > > has
> > > > to
> > > > >>> be wrong and I would vote for operator error (not to cast
> > aspersions,
> > > > it is
> > > > >>> just that humans like you and *me* make more errors than ZK
> does).
> > > > >>>
> > > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > > > wrote:
> > > > >>>
> > > > >>>> hi, All,
> > > > >>>>
> > > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> > > hosts,
> > > > >>>> there are a number of nodes that I can "get" and "ls" using
> > zkCli.sh
> > > .
> > > > >>>> However, when I tried to "delete" any of them, I got "Node does
> > not
> > > > exist"
> > > > >>>> error.    Those nodes do not exist on the other two hosts.
> > > > >>>>
> > > > >>>> Any idea how we should handle this type of errors and what might
> > > have
> > > > >>>> caused this problem?
> > > > >>>>
> > > > >>>> Dr Hao He
> > > > >>>>
> > > > >>>> XPE - the truly SOA platform
> > > > >>>>
> > > > >>>> he@softtouchit.com
> > > > >>>> http://softtouchit.com
> > > > >>>> http://itunes.com/apps/Scanmobile
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
It doesn't.

But running a ZK cluster that is incorrectly configured can cause this
problem and configuring ZK using setters is likely to be subject to changes
in what configuration is needed.  Thus, your style of code is more subject
to decay over time than is nice.

The rest of my comments detail *other* reasons why embedding a coordination
layer in the code being coordinated is a bad idea.

On Thu, Aug 12, 2010 at 6:33 AM, Vishal K <vi...@gmail.com> wrote:

> Hi Ted,
>
> Can you explain why running ZK in embedded mode can cause znode
> inconsistencies?
> Thanks.
>
> -Vishal
>
> On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Try running the server in non-embedded mode.
> >
> > Also, you are assuming that you know everything about how to configure
> the
> > quorumPeer.  That is going to change and your code will break at that
> time.
> >  If you use a non-embedded cluster, this won't be a problem and you will
> be
> > able to upgrade ZK version without having to restart your service.
> >
> > My own opinion is that running an embedded ZK is a serious architectural
> > error.  Since I don't know your particular situation, it might be
> > different,
> > but there is an inherent contradiction involved in running a coordination
> > layer as part of the thing being coordinated.  Whatever your software
> does,
> > it isn't what ZK does.  As such, it is better to factor out the ZK
> > functionality and make it completely stable.  That gives you a much
> simpler
> > world and will make it easier for you to trouble shoot your system.  The
> > simple fact that you can't take down your service without affecting the
> > reliability of your ZK layer makes this a very bad idea.
> >
> > The problems you are having now are only a preview of what this
> > architectural error leads to.  There will be more problems and many of
> them
> > are likely to be more subtle and lead to service interruptions and lots
> of
> > wasted time.
> >
> > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
> >
> > > hi, Ted and Mahadev,
> > >
> > >
> > > Here are some more details about my setup:
> > >
> > > I run zookeeper in the embedded mode with the following code:
> > >
> > >                                        quorumPeer = new QuorumPeer();
> > >
> > >  quorumPeer.setClientPort(getClientPort());
> > >                                        quorumPeer.setTxnFactory(new
> > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> > >
> > >  quorumPeer.setQuorumPeers(getServers());
> > >
> > >  quorumPeer.setElectionType(getElectionAlg());
> > >
>  quorumPeer.setMyid(getServerId());
> > >
> > >  quorumPeer.setTickTime(getTickTime());
> > >
> > >  quorumPeer.setInitLimit(getInitLimit());
> > >
> > >  quorumPeer.setSyncLimit(getSyncLimit());
> > >
> > >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> > >
> > >  quorumPeer.setCnxnFactory(cnxnFactory);
> > >                                        quorumPeer.start();
> > >
> > >
> > > The configuration values are read from the following XML document for
> > > server 1:
> > >
> > > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> > > serverId="1">
> > >                  <member id="1" host="192.168.2.6:2888:3888"/>
> > >                  <member id="2" host="192.168.2.3:2888:3888"/>
> > >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > > </cluster>
> > >
> > >
> > > The other servers have the same configurations except their ids being
> > > changed to 2 and 3.
> > >
> > > The error occurred on server 3 when I batch loaded some messages to
> > server
> > > 1.  However, this error does not always happen.  I am not sure exactly
> > what
> > > trigged this error yet.
> > >
> > > I also performed the "stat" operation on one of the "No exit" node and
> > got:
> > >
> > > stat
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > > Exception in thread "main" java.lang.NullPointerException
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> > >        at
> org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> > >        at
> org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> > >
> > >
> > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and
> > are
> > > deleted by the last server who has read them.
> > >
> > > If I remove the troubled server's zookeeper log directory and restart
> the
> > > server, then everything is ok.
> > >
> > > I will try to get the nc result next time I see this problem.
> > >
> > >
> > > Dr Hao He
> > >
> > > XPE - the truly SOA platform
> > >
> > > he@softtouchit.com
> > > http://softtouchit.com
> > > http://itunes.com/apps/Scanmobile
> > >
> > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> > >
> > > > HI Dr Hao,
> > > >  Can you please post the configuration of all the 3 zookeeper
> servers?
> > I
> > > > suspect it might be misconfigured clusters and they might not belong
> to
> > > the
> > > > same ensemble.
> > > >
> > > > Just to be clear:
> > > >
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > > >
> > > > And other such nodes exist on one of the zookeeper servers and the
> same
> > > node
> > > > does not exist on other servers?
> > > >
> > > > Also, as ted pointed out, can you please post the output of echo
> ³stat²
> > |
> > > nc
> > > > localhost 2181 (on all the 3 servers) to the list?
> > > >
> > > > Thanks
> > > > mahadev
> > > >
> > > >
> > > >
> > > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > > >
> > > >> hi, Ted,
> > > >>
> > > >> Thanks for the reply.  Here is what I did:
> > > >>
> > > >> [zk: localhost:2181(CONNECTED) 0] ls
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >> []
> > > >> zk: localhost:2181(CONNECTED) 1] ls
> > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > > msg0000002704,
> > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > > msg0000002508,
> > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > > msg0000002604,
> > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > > msg0000002814,
> > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > > msg0000001772,
> > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > > msg0000002610,
> > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > > msg0000001973,
> > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > > msg0000002510,
> > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > > msg0000002104,
> > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > > msg0000002822,
> > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > > msg0000002110,
> > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > > msg0000002907,
> > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > > msg0000001958,
> > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > > msg0000001608,
> > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > > msg0000002888,
> > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > > msg0000002330,
> > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > > msg0000001491,
> > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > > msg0000002892,
> > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > > msg0000001733,
> > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > > msg0000002332,
> > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > > msg0000001720,
> > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > > msg0000002350,
> > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > > msg0000001623,
> > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > > msg0000002738,
> > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > > msg0000002361,
> > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > > msg0000002358,
> > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > > msg0000002354,
> > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > > msg0000002576,
> > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > > msg0000001901,
> > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > > msg0000002368,
> > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > > msg0000002481,
> > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > > msg0000001599,
> > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > > msg0000002583,
> > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > > msg0000002278,
> > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > > msg0000002182,
> > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > > msg0000002186,
> > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > > msg0000002661,
> > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > > msg0000002766,
> > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > > msg0000002596,
> > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > > msg0000002191,
> > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > > msg0000002655,
> > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > > msg0000002796,
> > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > > msg0000002061,
> > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > > msg0000002444,
> > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > > msg0000001501,
> > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > > msg0000002260,
> > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > > msg0000002590,
> > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > > msg0000001559,
> > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > > msg0000002937,
> > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > > msg0000001937,
> > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > > msg0000002524,
> > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > > msg0000002138,
> > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > > msg0000002010,
> > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > > msg0000002147,
> > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > > msg0000002690,
> > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > > msg0000001812,
> > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > > msg0000002941,
> > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > > msg0000001540,
> > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > > msg0000001584,
> > > >> msg0000002948]
> > > >>
> > > >> [zk: localhost:2181(CONNECTED) 7] delete
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >> Node does not exist:
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >>
> > > >> When I performed the same operations on another node, none of those
> > > nodes
> > > >> existed.
> > > >>
> > > >>
> > > >> Dr Hao He
> > > >>
> > > >> XPE - the truly SOA platform
> > > >>
> > > >> he@softtouchit.com
> > > >> http://softtouchit.com
> > > >> http://itunes.com/apps/Scanmobile
> > > >>
> > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > > >>
> > > >>> Can you provide some more information?  The output of some of the
> > four
> > > >>> letter commands and a transcript of what you are doing would be
> very
> > > >>> helpful.
> > > >>>
> > > >>> Also, there is no way for znodes to exist on one node of a properly
> > > >>> operating ZK cluster and not on either of the other two.  Something
> > has
> > > to
> > > >>> be wrong and I would vote for operator error (not to cast
> aspersions,
> > > it is
> > > >>> just that humans like you and *me* make more errors than ZK does).
> > > >>>
> > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > > wrote:
> > > >>>
> > > >>>> hi, All,
> > > >>>>
> > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> > hosts,
> > > >>>> there are a number of nodes that I can "get" and "ls" using
> zkCli.sh
> > .
> > > >>>> However, when I tried to "delete" any of them, I got "Node does
> not
> > > exist"
> > > >>>> error.    Those nodes do not exist on the other two hosts.
> > > >>>>
> > > >>>> Any idea how we should handle this type of errors and what might
> > have
> > > >>>> caused this problem?
> > > >>>>
> > > >>>> Dr Hao He
> > > >>>>
> > > >>>> XPE - the truly SOA platform
> > > >>>>
> > > >>>> he@softtouchit.com
> > > >>>> http://softtouchit.com
> > > >>>> http://itunes.com/apps/Scanmobile
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.
Hi Ted,

Can you explain why running ZK in embedded mode can cause znode
inconsistencies?
Thanks.

-Vishal

On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com> wrote:

> Try running the server in non-embedded mode.
>
> Also, you are assuming that you know everything about how to configure the
> quorumPeer.  That is going to change and your code will break at that time.
>  If you use a non-embedded cluster, this won't be a problem and you will be
> able to upgrade ZK version without having to restart your service.
>
> My own opinion is that running an embedded ZK is a serious architectural
> error.  Since I don't know your particular situation, it might be
> different,
> but there is an inherent contradiction involved in running a coordination
> layer as part of the thing being coordinated.  Whatever your software does,
> it isn't what ZK does.  As such, it is better to factor out the ZK
> functionality and make it completely stable.  That gives you a much simpler
> world and will make it easier for you to trouble shoot your system.  The
> simple fact that you can't take down your service without affecting the
> reliability of your ZK layer makes this a very bad idea.
>
> The problems you are having now are only a preview of what this
> architectural error leads to.  There will be more problems and many of them
> are likely to be more subtle and lead to service interruptions and lots of
> wasted time.
>
> On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
>
> > hi, Ted and Mahadev,
> >
> >
> > Here are some more details about my setup:
> >
> > I run zookeeper in the embedded mode with the following code:
> >
> >                                        quorumPeer = new QuorumPeer();
> >
> >  quorumPeer.setClientPort(getClientPort());
> >                                        quorumPeer.setTxnFactory(new
> > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> >
> >  quorumPeer.setQuorumPeers(getServers());
> >
> >  quorumPeer.setElectionType(getElectionAlg());
> >                                        quorumPeer.setMyid(getServerId());
> >
> >  quorumPeer.setTickTime(getTickTime());
> >
> >  quorumPeer.setInitLimit(getInitLimit());
> >
> >  quorumPeer.setSyncLimit(getSyncLimit());
> >
> >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> >
> >  quorumPeer.setCnxnFactory(cnxnFactory);
> >                                        quorumPeer.start();
> >
> >
> > The configuration values are read from the following XML document for
> > server 1:
> >
> > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> > serverId="1">
> >                  <member id="1" host="192.168.2.6:2888:3888"/>
> >                  <member id="2" host="192.168.2.3:2888:3888"/>
> >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > </cluster>
> >
> >
> > The other servers have the same configurations except their ids being
> > changed to 2 and 3.
> >
> > The error occurred on server 3 when I batch loaded some messages to
> server
> > 1.  However, this error does not always happen.  I am not sure exactly
> what
> > trigged this error yet.
> >
> > I also performed the "stat" operation on one of the "No exit" node and
> got:
> >
> > stat
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > Exception in thread "main" java.lang.NullPointerException
> >        at
> > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> >        at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> >        at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> >
> >
> > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and
> are
> > deleted by the last server who has read them.
> >
> > If I remove the troubled server's zookeeper log directory and restart the
> > server, then everything is ok.
> >
> > I will try to get the nc result next time I see this problem.
> >
> >
> > Dr Hao He
> >
> > XPE - the truly SOA platform
> >
> > he@softtouchit.com
> > http://softtouchit.com
> > http://itunes.com/apps/Scanmobile
> >
> > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> >
> > > HI Dr Hao,
> > >  Can you please post the configuration of all the 3 zookeeper servers?
> I
> > > suspect it might be misconfigured clusters and they might not belong to
> > the
> > > same ensemble.
> > >
> > > Just to be clear:
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > >
> > > And other such nodes exist on one of the zookeeper servers and the same
> > node
> > > does not exist on other servers?
> > >
> > > Also, as ted pointed out, can you please post the output of echo ³stat²
> |
> > nc
> > > localhost 2181 (on all the 3 servers) to the list?
> > >
> > > Thanks
> > > mahadev
> > >
> > >
> > >
> > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > >
> > >> hi, Ted,
> > >>
> > >> Thanks for the reply.  Here is what I did:
> > >>
> > >> [zk: localhost:2181(CONNECTED) 0] ls
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> []
> > >> zk: localhost:2181(CONNECTED) 1] ls
> > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > msg0000002704,
> > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > msg0000002508,
> > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > msg0000002604,
> > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > msg0000002814,
> > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > msg0000001772,
> > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > msg0000002610,
> > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > msg0000001973,
> > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > msg0000002510,
> > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > msg0000002104,
> > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > msg0000002822,
> > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > msg0000002110,
> > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > msg0000002907,
> > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > msg0000001958,
> > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > msg0000001608,
> > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > msg0000002888,
> > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > msg0000002330,
> > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > msg0000001491,
> > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > msg0000002892,
> > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > msg0000001733,
> > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > msg0000002332,
> > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > msg0000001720,
> > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > msg0000002350,
> > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > msg0000001623,
> > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > msg0000002738,
> > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > msg0000002361,
> > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > msg0000002358,
> > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > msg0000002354,
> > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > msg0000002576,
> > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > msg0000001901,
> > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > msg0000002368,
> > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > msg0000002481,
> > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > msg0000001599,
> > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > msg0000002583,
> > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > msg0000002278,
> > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > msg0000002182,
> > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > msg0000002186,
> > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > msg0000002661,
> > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > msg0000002766,
> > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > msg0000002596,
> > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > msg0000002191,
> > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > msg0000002655,
> > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > msg0000002796,
> > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > msg0000002061,
> > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > msg0000002444,
> > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > msg0000001501,
> > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > msg0000002260,
> > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > msg0000002590,
> > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > msg0000001559,
> > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > msg0000002937,
> > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > msg0000001937,
> > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > msg0000002524,
> > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > msg0000002138,
> > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > msg0000002010,
> > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > msg0000002147,
> > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > msg0000002690,
> > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > msg0000001812,
> > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > msg0000002941,
> > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > msg0000001540,
> > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > msg0000001584,
> > >> msg0000002948]
> > >>
> > >> [zk: localhost:2181(CONNECTED) 7] delete
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> Node does not exist:
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >>
> > >> When I performed the same operations on another node, none of those
> > nodes
> > >> existed.
> > >>
> > >>
> > >> Dr Hao He
> > >>
> > >> XPE - the truly SOA platform
> > >>
> > >> he@softtouchit.com
> > >> http://softtouchit.com
> > >> http://itunes.com/apps/Scanmobile
> > >>
> > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > >>
> > >>> Can you provide some more information?  The output of some of the
> four
> > >>> letter commands and a transcript of what you are doing would be very
> > >>> helpful.
> > >>>
> > >>> Also, there is no way for znodes to exist on one node of a properly
> > >>> operating ZK cluster and not on either of the other two.  Something
> has
> > to
> > >>> be wrong and I would vote for operator error (not to cast aspersions,
> > it is
> > >>> just that humans like you and *me* make more errors than ZK does).
> > >>>
> > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > wrote:
> > >>>
> > >>>> hi, All,
> > >>>>
> > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> hosts,
> > >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh
> .
> > >>>> However, when I tried to "delete" any of them, I got "Node does not
> > exist"
> > >>>> error.    Those nodes do not exist on the other two hosts.
> > >>>>
> > >>>> Any idea how we should handle this type of errors and what might
> have
> > >>>> caused this problem?
> > >>>>
> > >>>> Dr Hao He
> > >>>>
> > >>>> XPE - the truly SOA platform
> > >>>>
> > >>>> he@softtouchit.com
> > >>>> http://softtouchit.com
> > >>>> http://itunes.com/apps/Scanmobile
> > >>>>
> > >>>>
> > >>
> > >>
> > >
> > >
> >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
Try running the server in non-embedded mode.

Also, you are assuming that you know everything about how to configure the
quorumPeer.  That is going to change and your code will break at that time.
 If you use a non-embedded cluster, this won't be a problem and you will be
able to upgrade ZK version without having to restart your service.

My own opinion is that running an embedded ZK is a serious architectural
error.  Since I don't know your particular situation, it might be different,
but there is an inherent contradiction involved in running a coordination
layer as part of the thing being coordinated.  Whatever your software does,
it isn't what ZK does.  As such, it is better to factor out the ZK
functionality and make it completely stable.  That gives you a much simpler
world and will make it easier for you to trouble shoot your system.  The
simple fact that you can't take down your service without affecting the
reliability of your ZK layer makes this a very bad idea.

The problems you are having now are only a preview of what this
architectural error leads to.  There will be more problems and many of them
are likely to be more subtle and lead to service interruptions and lots of
wasted time.

On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted and Mahadev,
>
>
> Here are some more details about my setup:
>
> I run zookeeper in the embedded mode with the following code:
>
>                                        quorumPeer = new QuorumPeer();
>
>  quorumPeer.setClientPort(getClientPort());
>                                        quorumPeer.setTxnFactory(new
> FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
>
>  quorumPeer.setQuorumPeers(getServers());
>
>  quorumPeer.setElectionType(getElectionAlg());
>                                        quorumPeer.setMyid(getServerId());
>
>  quorumPeer.setTickTime(getTickTime());
>
>  quorumPeer.setInitLimit(getInitLimit());
>
>  quorumPeer.setSyncLimit(getSyncLimit());
>
>  quorumPeer.setQuorumVerifier(getQuorumVerifier());
>
>  quorumPeer.setCnxnFactory(cnxnFactory);
>                                        quorumPeer.start();
>
>
> The configuration values are read from the following XML document for
> server 1:
>
> <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> serverId="1">
>                  <member id="1" host="192.168.2.6:2888:3888"/>
>                  <member id="2" host="192.168.2.3:2888:3888"/>
>                  <member id="3" host="192.168.2.4:2888:3888"/>
> </cluster>
>
>
> The other servers have the same configurations except their ids being
> changed to 2 and 3.
>
> The error occurred on server 3 when I batch loaded some messages to server
> 1.  However, this error does not always happen.  I am not sure exactly what
> trigged this error yet.
>
> I also performed the "stat" operation on one of the "No exit" node and got:
>
> stat
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
>        at
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
>        at
> org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
>        at
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
>        at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
>        at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
>
>
> Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are
> deleted by the last server who has read them.
>
> If I remove the troubled server's zookeeper log directory and restart the
> server, then everything is ok.
>
> I will try to get the nc result next time I see this problem.
>
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
> On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
>
> > HI Dr Hao,
> >  Can you please post the configuration of all the 3 zookeeper servers? I
> > suspect it might be misconfigured clusters and they might not belong to
> the
> > same ensemble.
> >
> > Just to be clear:
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> >
> > And other such nodes exist on one of the zookeeper servers and the same
> node
> > does not exist on other servers?
> >
> > Also, as ted pointed out, can you please post the output of echo ³stat² |
> nc
> > localhost 2181 (on all the 3 servers) to the list?
> >
> > Thanks
> > mahadev
> >
> >
> >
> > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> >
> >> hi, Ted,
> >>
> >> Thanks for the reply.  Here is what I did:
> >>
> >> [zk: localhost:2181(CONNECTED) 0] ls
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >> []
> >> zk: localhost:2181(CONNECTED) 1] ls
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> msg0000002704,
> >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> msg0000002508,
> >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> msg0000002604,
> >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> msg0000002814,
> >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> msg0000001772,
> >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> msg0000002610,
> >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> msg0000001973,
> >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> msg0000002510,
> >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> msg0000002104,
> >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> msg0000002822,
> >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> msg0000002110,
> >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> msg0000002907,
> >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> msg0000001958,
> >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> msg0000001608,
> >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> msg0000002888,
> >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> msg0000002330,
> >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> msg0000001491,
> >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> msg0000002892,
> >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> msg0000001733,
> >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> msg0000002332,
> >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> msg0000001720,
> >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> msg0000002350,
> >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> msg0000001623,
> >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> msg0000002738,
> >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> msg0000002361,
> >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> msg0000002358,
> >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> msg0000002354,
> >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> msg0000002576,
> >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> msg0000001901,
> >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> msg0000002368,
> >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> msg0000002481,
> >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> msg0000001599,
> >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> msg0000002583,
> >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> msg0000002278,
> >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> msg0000002182,
> >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> msg0000002186,
> >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> msg0000002661,
> >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> msg0000002766,
> >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> msg0000002596,
> >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> msg0000002191,
> >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> msg0000002655,
> >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> msg0000002796,
> >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> msg0000002061,
> >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> msg0000002444,
> >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> msg0000001501,
> >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> msg0000002260,
> >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> msg0000002590,
> >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> msg0000001559,
> >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> msg0000002937,
> >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> msg0000001937,
> >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> msg0000002524,
> >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> msg0000002138,
> >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> msg0000002010,
> >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> msg0000002147,
> >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> msg0000002690,
> >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> msg0000001812,
> >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> msg0000002941,
> >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> msg0000001540,
> >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> msg0000001584,
> >> msg0000002948]
> >>
> >> [zk: localhost:2181(CONNECTED) 7] delete
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >> Node does not exist:
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >>
> >> When I performed the same operations on another node, none of those
> nodes
> >> existed.
> >>
> >>
> >> Dr Hao He
> >>
> >> XPE - the truly SOA platform
> >>
> >> he@softtouchit.com
> >> http://softtouchit.com
> >> http://itunes.com/apps/Scanmobile
> >>
> >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> >>
> >>> Can you provide some more information?  The output of some of the four
> >>> letter commands and a transcript of what you are doing would be very
> >>> helpful.
> >>>
> >>> Also, there is no way for znodes to exist on one node of a properly
> >>> operating ZK cluster and not on either of the other two.  Something has
> to
> >>> be wrong and I would vote for operator error (not to cast aspersions,
> it is
> >>> just that humans like you and *me* make more errors than ZK does).
> >>>
> >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> wrote:
> >>>
> >>>> hi, All,
> >>>>
> >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
> >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
> >>>> However, when I tried to "delete" any of them, I got "Node does not
> exist"
> >>>> error.    Those nodes do not exist on the other two hosts.
> >>>>
> >>>> Any idea how we should handle this type of errors and what might have
> >>>> caused this problem?
> >>>>
> >>>> Dr Hao He
> >>>>
> >>>> XPE - the truly SOA platform
> >>>>
> >>>> he@softtouchit.com
> >>>> http://softtouchit.com
> >>>> http://itunes.com/apps/Scanmobile
> >>>>
> >>>>
> >>
> >>
> >
> >
>
>

Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.
hi, Ted and Mahadev,


Here are some more details about my setup:

I run zookeeper in the embedded mode with the following code:

					quorumPeer = new QuorumPeer();
					quorumPeer.setClientPort(getClientPort());
					quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
					quorumPeer.setQuorumPeers(getServers());
					quorumPeer.setElectionType(getElectionAlg());
					quorumPeer.setMyid(getServerId());
					quorumPeer.setTickTime(getTickTime());
					quorumPeer.setInitLimit(getInitLimit());
					quorumPeer.setSyncLimit(getSyncLimit());
					quorumPeer.setQuorumVerifier(getQuorumVerifier());
					quorumPeer.setCnxnFactory(cnxnFactory);
					quorumPeer.start();


The configuration values are read from the following XML document for server 1:

<cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181" serverId="1">
		  <member id="1" host="192.168.2.6:2888:3888"/>
                  <member id="2" host="192.168.2.3:2888:3888"/> 
               	  <member id="3" host="192.168.2.4:2888:3888"/>
</cluster>


The other servers have the same configurations except their ids being changed to 2 and 3.

The error occurred on server 3 when I batch loaded some messages to server 1.  However, this error does not always happen.  I am not sure exactly what trigged this error yet.

I also performed the "stat" operation on one of the "No exit" node and got:

stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583   
Exception in thread "main" java.lang.NullPointerException
	at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
[xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh 


Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are deleted by the last server who has read them. 

If I remove the troubled server's zookeeper log directory and restart the server, then everything is ok.

I will try to get the nc result next time I see this problem.


Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:

> HI Dr Hao,
>  Can you please post the configuration of all the 3 zookeeper servers? I
> suspect it might be misconfigured clusters and they might not belong to the
> same ensemble.
> 
> Just to be clear:
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> 
> And other such nodes exist on one of the zookeeper servers and the same node
> does not exist on other servers?
> 
> Also, as ted pointed out, can you please post the output of echo ³stat² | nc
> localhost 2181 (on all the 3 servers) to the list?
> 
> Thanks
> mahadev
> 
> 
> 
> On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> 
>> hi, Ted,
>> 
>> Thanks for the reply.  Here is what I did:
>> 
>> [zk: localhost:2181(CONNECTED) 0] ls
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> []
>> zk: localhost:2181(CONNECTED) 1] ls
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
>> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704,
>> msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508,
>> msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604,
>> msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814,
>> msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772,
>> msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610,
>> msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973,
>> msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510,
>> msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104,
>> msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822,
>> msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110,
>> msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907,
>> msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958,
>> msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608,
>> msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888,
>> msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330,
>> msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491,
>> msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892,
>> msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733,
>> msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332,
>> msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720,
>> msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350,
>> msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623,
>> msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738,
>> msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361,
>> msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358,
>> msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354,
>> msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576,
>> msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901,
>> msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368,
>> msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481,
>> msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599,
>> msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583,
>> msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278,
>> msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182,
>> msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186,
>> msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661,
>> msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766,
>> msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596,
>> msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191,
>> msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655,
>> msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796,
>> msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061,
>> msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444,
>> msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501,
>> msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260,
>> msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590,
>> msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559,
>> msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937,
>> msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937,
>> msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524,
>> msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138,
>> msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010,
>> msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147,
>> msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690,
>> msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812,
>> msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941,
>> msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540,
>> msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584,
>> msg0000002948]
>> 
>> [zk: localhost:2181(CONNECTED) 7] delete
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> Node does not exist:
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> 
>> When I performed the same operations on another node, none of those nodes
>> existed.
>> 
>> 
>> Dr Hao He
>> 
>> XPE - the truly SOA platform
>> 
>> he@softtouchit.com
>> http://softtouchit.com
>> http://itunes.com/apps/Scanmobile
>> 
>> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
>> 
>>> Can you provide some more information?  The output of some of the four
>>> letter commands and a transcript of what you are doing would be very
>>> helpful.
>>> 
>>> Also, there is no way for znodes to exist on one node of a properly
>>> operating ZK cluster and not on either of the other two.  Something has to
>>> be wrong and I would vote for operator error (not to cast aspersions, it is
>>> just that humans like you and *me* make more errors than ZK does).
>>> 
>>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
>>> 
>>>> hi, All,
>>>> 
>>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>>>> However, when I tried to "delete" any of them, I got "Node does not exist"
>>>> error.    Those nodes do not exist on the other two hosts.
>>>> 
>>>> Any idea how we should handle this type of errors and what might have
>>>> caused this problem?
>>>> 
>>>> Dr Hao He
>>>> 
>>>> XPE - the truly SOA platform
>>>> 
>>>> he@softtouchit.com
>>>> http://softtouchit.com
>>>> http://itunes.com/apps/Scanmobile
>>>> 
>>>> 
>> 
>> 
> 
> 


Re: How to handle "Node does not exist" error?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.
HI Dr Hao,
  Can you please post the configuration of all the 3 zookeeper servers? I
suspect it might be misconfigured clusters and they might not belong to the
same ensemble.

Just to be clear:
/xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807

And other such nodes exist on one of the zookeeper servers and the same node
does not exist on other servers?

Also, as ted pointed out, can you please post the output of echo ³stat² | nc
localhost 2181 (on all the 3 servers) to the list?

Thanks
mahadev



On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:

> hi, Ted,
> 
> Thanks for the reply.  Here is what I did:
> 
> [zk: localhost:2181(CONNECTED) 0] ls
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> []
> zk: localhost:2181(CONNECTED) 1] ls
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704,
> msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508,
> msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604,
> msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814,
> msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772,
> msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610,
> msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973,
> msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510,
> msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104,
> msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822,
> msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110,
> msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907,
> msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958,
> msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608,
> msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888,
> msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330,
> msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491,
> msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892,
> msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733,
> msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332,
> msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720,
> msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350,
> msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623,
> msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738,
> msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361,
> msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358,
> msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354,
> msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576,
> msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901,
> msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368,
> msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481,
> msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599,
> msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583,
> msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278,
> msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182,
> msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186,
> msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661,
> msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766,
> msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596,
> msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191,
> msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655,
> msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796,
> msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061,
> msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444,
> msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501,
> msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260,
> msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590,
> msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559,
> msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937,
> msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937,
> msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524,
> msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138,
> msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010,
> msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147,
> msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690,
> msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812,
> msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941,
> msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540,
> msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584,
> msg0000002948]
> 
> [zk: localhost:2181(CONNECTED) 7] delete
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> Node does not exist:
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> 
> When I performed the same operations on another node, none of those nodes
> existed.
> 
> 
> Dr Hao He
> 
> XPE - the truly SOA platform
> 
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
> 
> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> 
>> Can you provide some more information?  The output of some of the four
>> letter commands and a transcript of what you are doing would be very
>> helpful.
>> 
>> Also, there is no way for znodes to exist on one node of a properly
>> operating ZK cluster and not on either of the other two.  Something has to
>> be wrong and I would vote for operator error (not to cast aspersions, it is
>> just that humans like you and *me* make more errors than ZK does).
>> 
>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
>> 
>>> hi, All,
>>> 
>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>>> However, when I tried to "delete" any of them, I got "Node does not exist"
>>> error.    Those nodes do not exist on the other two hosts.
>>> 
>>> Any idea how we should handle this type of errors and what might have
>>> caused this problem?
>>> 
>>> Dr Hao He
>>> 
>>> XPE - the truly SOA platform
>>> 
>>> he@softtouchit.com
>>> http://softtouchit.com
>>> http://itunes.com/apps/Scanmobile
>>> 
>>> 
> 
> 


Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.
hi, Ted,

Thanks for the reply.  Here is what I did:

[zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
[]
zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs              
[msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704, msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508, msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604, msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814, msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772, msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610, msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973, msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510, msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104, msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822, msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110, msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907, msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958, msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608, msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888, msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330, msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491, msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892, msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733, msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332, msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720, msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350, msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623, msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738, msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361, msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358, msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354, msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576, msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901, msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368, msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481, msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599, msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583, msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278, msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182, msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186, msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661, msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766, msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596, msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191, msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655, msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796, msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061, msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444, msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501, msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260, msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590, msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559, msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937, msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937, msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524, msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138, msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010, msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147, msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690, msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812, msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941, msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540, msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584, msg0000002948]

[zk: localhost:2181(CONNECTED) 7] delete /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
Node does not exist: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948

When I performed the same operations on another node, none of those nodes existed. 


Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 11/08/2010, at 4:38 PM, Ted Dunning wrote:

> Can you provide some more information?  The output of some of the four
> letter commands and a transcript of what you are doing would be very
> helpful.
> 
> Also, there is no way for znodes to exist on one node of a properly
> operating ZK cluster and not on either of the other two.  Something has to
> be wrong and I would vote for operator error (not to cast aspersions, it is
> just that humans like you and *me* make more errors than ZK does).
> 
> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
> 
>> hi, All,
>> 
>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>> However, when I tried to "delete" any of them, I got "Node does not exist"
>> error.    Those nodes do not exist on the other two hosts.
>> 
>> Any idea how we should handle this type of errors and what might have
>> caused this problem?
>> 
>> Dr Hao He
>> 
>> XPE - the truly SOA platform
>> 
>> he@softtouchit.com
>> http://softtouchit.com
>> http://itunes.com/apps/Scanmobile
>> 
>> 


Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.
Can you provide some more information?  The output of some of the four
letter commands and a transcript of what you are doing would be very
helpful.

Also, there is no way for znodes to exist on one node of a properly
operating ZK cluster and not on either of the other two.  Something has to
be wrong and I would vote for operator error (not to cast aspersions, it is
just that humans like you and *me* make more errors than ZK does).

On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, All,
>
> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>  However, when I tried to "delete" any of them, I got "Node does not exist"
> error.    Those nodes do not exist on the other two hosts.
>
> Any idea how we should handle this type of errors and what might have
> caused this problem?
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
>