You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by David Graf <da...@28msec.com> on 2009/07/06 13:45:16 UTC

zookeeper on ec2

Hello

I wanna set up a zookeeper ensemble on amazon's ec2 service. In my  
system, zookeeper is used to run a locking service and to generate  
unique id's. Currently, for testing purposes, I am only running one  
instance. Now, I need to set up an ensemble to protect my system  
against crashes.
The ec2 services has some differences to a normal server farm. E.g.  
the data saved on the file system of an ec2 instance is lost if the  
instance crashes. In the documentation of zookeeper, I have read that  
zookeeper saves snapshots of the in-memory data in the file system. Is  
that needed for recovery? Logically, it would be much easier for me if  
this is not the case.
Additionally, ec2 brings the advantage that serves can be switch on  
and off dynamically dependent on the load, traffic, etc. Can this  
advantage be utilized for a zookeeper ensemble? Is it possible to add  
a zookeeper server dynamically to an ensemble? E.g. dependent on the  
in-memory load?

David

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

There is no full stop for our system.  It (had better) will run forever.

That said, we have no permanent information in ZK that is not persisted in,
say, SVN.

On Mon, Jul 6, 2009 at 3:06 PM, Gustavo Niemeyer <gu...@niemeyer.net>wrote:

> > Doing that would make the intricate and unlikely failure mode that Henry
> > asked about even less likely, but I don't know if it would increase or
> > decrease the probability of any kind of failure.
>
> Yeah, I guess it depends a bit on the system architecture too.  If the
> system is designed in such a way that ZK is keeping track of
> coordination data which must be resumed after a full stop of the
> system, having it stored in persistent data would prevent important
> loss of information.

Re: zookeeper on ec2

Posted by Gustavo Niemeyer <gu...@niemeyer.net>.

Hi again,

(...)
> ZK seemed pretty darned stable through all of this.

Sounds like a nice test, and it's great to hear that ZooKeeper works well there.

> The only instability that I saw was caused by excessive amounts of data in
> ZK itself.  As I neared the (small) amount of memory I had allocated for Zk
> use, I would see servers go into paroxysms of GC, but the cluster
> functionality was impaired to a very surprisingly small degree.

Cool, makes sense.

> No.  I considered it, but I wanted fewer moving parts rather than more.
>
> Doing that would make the intricate and unlikely failure mode that Henry
> asked about even less likely, but I don't know if it would increase or
> decrease the probability of any kind of failure.

Yeah, I guess it depends a bit on the system architecture too.  If the
system is designed in such a way that ZK is keeping track of
coordination data which must be resumed after a full stop of the
system, having it stored in persistent data would prevent important
loss of information.  If ZK is really just coordinating ephemeral data
(e.g. locks), then if the whole system goes down, it's ok to just
allow it to start up again in an empty state.

> The observed failure modes for ZK in EC2 were completely dominated by our
> (my) own failings (such as letting too much data accumulate).

Details always take a few iterations to get really right.

Thanks for this data Ted.

-- 
Gustavo Niemeyer
http://niemeyer.net

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Jul 6, 2009 at 12:58 PM, Gustavo Niemeyer <gu...@niemeyer.net>wrote:

> > can make the ZK servers appear a bit less connected.  You have to plan
> for
> > ConnectionLoss events.
>
> Interesting.

Note that most of these seem to be related to client issues, especially GC.
If you configure in such a way as to get long pauses, you will see
connection loss.  The default configuration for ZK is for a pretty short (5
seconds) timeout that is pretty easy to exceed with out-of-the-box GC params
on the client side.

> > c) for highest reliability, I switched to large instances.  On
> reflection, I
>  > think that was helpful, but less important than I thought at the time.
>
> Besides the fact that there are more resources for ZooKeeper, this
> likely helps as well because it reduces the number of systems
> competing for the real hardware.

Yes, but I think that this is less significant than I expected.  Small
instances have pretty dedicated access to their core.  Disk contention is a
bit of an issue, but not much.

> > d) increasing and decreasing cluster size is nearly painless and is
> easily
>  > scriptable.  To decrease, do a rolling update on the survivors to update
> (...)
>
> Quite interesting indeed.  I guess the work that Henry is pushing on
> these couple of JIRA tickets will greatly facilitate this.

Absolutely.  I was still very surprised at how small the pain is in the
current world.

> Do you have any kind of performance data about how much load ZK can
> take under this environment?

Only barely.  Partly with an eye toward system diagnostics, and partly to
cause ZK to have something to do, I reported a wide swath of data available
from /proc into ZK every few seconds for all of my servers.  This lead to a
few dozen transactions per second and ultimately helped me discover and
understand some of the connection issues for clients.

ZK seemed pretty darned stable through all of this.

The only instability that I saw was caused by excessive amounts of data in
ZK itself.  As I neared the (small) amount of memory I had allocated for Zk
use, I would see servers go into paroxysms of GC, but the cluster
functionality was impaired to a very surprisingly small degree.

Have you tried to put the log and snapshot files under EBS?

No.  I considered it, but I wanted fewer moving parts rather than more.

Doing that would make the intricate and unlikely failure mode that Henry
asked about even less likely, but I don't know if it would increase or
decrease the probability of any kind of failure.

The observed failure modes for ZK in EC2 were completely dominated by our
(my) own failings (such as letting too much data accumulate).

Re: zookeeper on ec2

Posted by Gustavo Niemeyer <gu...@niemeyer.net>.

Hi Ted,

> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
> can make the ZK servers appear a bit less connected.  You have to plan for
> ConnectionLoss events.

Interesting.

> c) for highest reliability, I switched to large instances.  On reflection, I
> think that was helpful, but less important than I thought at the time.

Besides the fact that there are more resources for ZooKeeper, this
likely helps as well because it reduces the number of systems
competing for the real hardware.

> d) increasing and decreasing cluster size is nearly painless and is easily
> scriptable.  To decrease, do a rolling update on the survivors to update
(...)

Quite interesting indeed.  I guess the work that Henry is pushing on
these couple of JIRA tickets will greatly facilitate this.

Do you mind if I ask you a couple of questions on this:

Do you have any kind of performance data about how much load ZK can
take under this environment?

Have you tried to put the log and snapshot files under EBS?

-- 
Gustavo Niemeyer
http://niemeyer.net

Re: zookeeper on ec2

Posted by Benjamin Reed <br...@yahoo-inc.com>.

these suggestions would be great to put in a faq!

thanx ted

ben

Ted Dunning wrote:
> I always used a large node for ZK to avoid sharing the machine, but the
> reason for doing that turned out to be incorrect.  In fact, my problem was
> to do with GC on the client side.
>
> I can't believe that they are seeing 50 second delays in EC2 due to I/O
> contention.  GC can do that, but only on a large heap.  Massive swapping of
> code pages can also cause this.
>
> My debug path here would be:
>
> a) verify the facts.  The key fact is that the ZK cluster is occasionally
> giving massive latency.  This must be verified to be the real problem and
> not an accidental incident.  It is possible that the problem is not where we
> think it is.
>
> b) check for the usual configuration suspects.  ZK should be alone on a
> machine.  DNS should be checked.  Connectivity should be checked between all
> hosts.
>
> c) look for swapping, look at GC logs.  Something has to give a clue as to
> how the latency is 1000x longer than usual.
>
> d) fix what came from (b) or (c) step.
>
> I am at a loss here other than this general advice.  I strongly suspect that
> something is being observed incorrectly or the machines are being massively
> abused.
>
> On Wed, Sep 2, 2009 at 12:37 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>   
>> I suspect that given a single disk is being used (not a dedicated disk for
>> the transaction log), and also given that this host is highly virtualized
>> (ec2), it seems to me that the most likely cause is IO. Specifically when
>> the zk cluster writes data to disk (due to client write) it must sync the
>> transaction log to disk. This sync behavior can impact the latency seen by
>> the clients. What type of ec2 node are you using? Ted, do you have any
>> insight on this? Any guidelines for the type of ec2 node to use for running
>> a zk cluster?
>>
>>     
>
>
>
>

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

Ted that's great feedback. I identified a couple of additional things to 
verify after reading your comments:

1) ensure that you don't have debug level logging turned on, see this:
https://issues.apache.org/jira/browse/ZOOKEEPER-518
(fixed in 3.2.1, but in general you probably don't want to run anything 
lower than info in production except when attempting to track down some 
problem).

2) it would be a good idea to review the server/client zk logs to see if 
there's any insight there as to what might be causing the high 
latencies. For example the other day we had an issue where client code 
was misbehaving and causing degraded performance of the server, 
reviewing the logs allowed the developer to identify the client problem 
and address.

Patrick

Ted Dunning wrote:
> I always used a large node for ZK to avoid sharing the machine, but the
> reason for doing that turned out to be incorrect.  In fact, my problem was
> to do with GC on the client side.
> 
> I can't believe that they are seeing 50 second delays in EC2 due to I/O
> contention.  GC can do that, but only on a large heap.  Massive swapping of
> code pages can also cause this.
> 
> My debug path here would be:
> 
> a) verify the facts.  The key fact is that the ZK cluster is occasionally
> giving massive latency.  This must be verified to be the real problem and
> not an accidental incident.  It is possible that the problem is not where we
> think it is.
> 
> b) check for the usual configuration suspects.  ZK should be alone on a
> machine.  DNS should be checked.  Connectivity should be checked between all
> hosts.
> 
> c) look for swapping, look at GC logs.  Something has to give a clue as to
> how the latency is 1000x longer than usual.
> 
> d) fix what came from (b) or (c) step.
> 
> I am at a loss here other than this general advice.  I strongly suspect that
> something is being observed incorrectly or the machines are being massively
> abused.
> 
> On Wed, Sep 2, 2009 at 12:37 PM, Patrick Hunt <ph...@apache.org> wrote:
> 
>> I suspect that given a single disk is being used (not a dedicated disk for
>> the transaction log), and also given that this host is highly virtualized
>> (ec2), it seems to me that the most likely cause is IO. Specifically when
>> the zk cluster writes data to disk (due to client write) it must sync the
>> transaction log to disk. This sync behavior can impact the latency seen by
>> the clients. What type of ec2 node are you using? Ted, do you have any
>> insight on this? Any guidelines for the type of ec2 node to use for running
>> a zk cluster?
>>
> 
> 
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

I always used a large node for ZK to avoid sharing the machine, but the
reason for doing that turned out to be incorrect.  In fact, my problem was
to do with GC on the client side.

I can't believe that they are seeing 50 second delays in EC2 due to I/O
contention.  GC can do that, but only on a large heap.  Massive swapping of
code pages can also cause this.

My debug path here would be:

a) verify the facts.  The key fact is that the ZK cluster is occasionally
giving massive latency.  This must be verified to be the real problem and
not an accidental incident.  It is possible that the problem is not where we
think it is.

b) check for the usual configuration suspects.  ZK should be alone on a
machine.  DNS should be checked.  Connectivity should be checked between all
hosts.

c) look for swapping, look at GC logs.  Something has to give a clue as to
how the latency is 1000x longer than usual.

d) fix what came from (b) or (c) step.

I am at a loss here other than this general advice.  I strongly suspect that
something is being observed incorrectly or the machines are being massively
abused.

On Wed, Sep 2, 2009 at 12:37 PM, Patrick Hunt <ph...@apache.org> wrote:

> I suspect that given a single disk is being used (not a dedicated disk for
> the transaction log), and also given that this host is highly virtualized
> (ec2), it seems to me that the most likely cause is IO. Specifically when
> the zk cluster writes data to disk (due to client write) it must sync the
> transaction log to disk. This sync behavior can impact the latency seen by
> the clients. What type of ec2 node are you using? Ted, do you have any
> insight on this? Any guidelines for the type of ec2 node to use for running
> a zk cluster?
>

-- 
Ted Dunning, CTO
DeepDyve

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

How large/small are the writes?

Can you run the following, then try your test again and report the 
results for the time period where your test is running?

iostat -x -d 1

also note that ZK JMX allows you to reset the latency attributes (look 
under "operations" in jconsole). If you reset the latency during your 
test what do you see happen wrt the min/max/avg latency? Keep an eye on 
this (and perhaps resetting the stats every so often) during your test. 
Anything interesting happening that you notice?

I suspect that given a single disk is being used (not a dedicated disk 
for the transaction log), and also given that this host is highly 
virtualized (ec2), it seems to me that the most likely cause is IO. 
Specifically when the zk cluster writes data to disk (due to client 
write) it must sync the transaction log to disk. This sync behavior can 
impact the latency seen by the clients. What type of ec2 node are you 
using? Ted, do you have any insight on this? Any guidelines for the type 
of ec2 node to use for running a zk cluster?

Patrick

Satish Bhatti wrote:
> According to the jconsole, max memory usage is 30MB, 14 live threads and
> peak CPU usage about 4%, average is under 1%.  We are not really hammering
> it.  Doing about 10 read/writes every second max.
> 
> On Tue, Sep 1, 2009 at 5:20 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> This is outrageously large.  Max should be more like 50ms.
>>
>> Either you are doing this somehow, or you have an anomaly on your ZK
>> machine.
>>
>> How much data is in ZK?  How many transaction per seoncd?
>>
>> On Tue, Sep 1, 2009 at 5:11 PM, Satish Bhatti <ct...@gmail.com> wrote:
>>
>>> MaxRequestLatency 55767
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

According to the jconsole, max memory usage is 30MB, 14 live threads and
peak CPU usage about 4%, average is under 1%.  We are not really hammering
it.  Doing about 10 read/writes every second max.

On Tue, Sep 1, 2009 at 5:20 PM, Ted Dunning <te...@gmail.com> wrote:

> This is outrageously large.  Max should be more like 50ms.
>
> Either you are doing this somehow, or you have an anomaly on your ZK
> machine.
>
> How much data is in ZK?  How many transaction per seoncd?
>
> On Tue, Sep 1, 2009 at 5:11 PM, Satish Bhatti <ct...@gmail.com> wrote:
>
> > MaxRequestLatency 55767
>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

This is outrageously large.  Max should be more like 50ms.

Either you are doing this somehow, or you have an anomaly on your ZK
machine.

How much data is in ZK?  How many transaction per seoncd?

On Tue, Sep 1, 2009 at 5:11 PM, Satish Bhatti <ct...@gmail.com> wrote:

> MaxRequestLatency 55767




-- 
Ted Dunning, CTO
DeepDyve

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

I just checked the JMX console.
AvgRequestLatency 38
MaxRequestLatency 55767

I assume those units are milliseconds?

On Tue, Sep 1, 2009 at 5:05 PM, Patrick Hunt <ph...@apache.org> wrote:

> Yes. create/set/delete/... are really the issue (non-idempotent).
>
>
> Satish Bhatti wrote:
>
>> Well a bunch of the ConnectionLosses were for zookeeper.exists() calls.
>>  I'm
>> pretty sure dumb retry for those should suffice!
>>
>> On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <ma...@yahoo-inc.com>
>> wrote:
>>
>>  Hi Satish,
>>>
>>>  Connectionloss is a little trickier than just retrying blindly. Please
>>> read the following sections on this -
>>>
>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>
>>> And the programmers guide:
>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html
>>>
>>> To learn more about how to handle CONNECTIONLOSS. The idea is that that
>>> blindly retrying would create problems with CONNECTIONLOSS, since a
>>> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation
>>> that
>>> you were executing failed to execute. It might be possible that this
>>> operation went through the servers.
>>>
>>> Since, this has been a constant source of confusion for everyone who
>>> starts
>>> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care
>>> of
>>> this problem and programmers would not have to worry about CONNECTIONLOSS
>>> handling.
>>>
>>> Thanks
>>> mahadev
>>>
>>>
>>>
>>>
>>> On 9/1/09 4:13 PM, "Satish Bhatti" <ct...@gmail.com> wrote:
>>>
>>>  I have recently started running on EC2 and am seeing quite a few
>>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since
>>>> I
>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>> SessionExpired?
>>>> Satish
>>>>
>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>>>>
>>> wrote:
>>>
>>>> We have used EC2 quite a bit for ZK.
>>>>>
>>>>> The basic lessons that I have learned include:
>>>>>
>>>>> a) EC2's biggest advantage after scaling and elasticity was conformity
>>>>>
>>>> of
>>>
>>>> configuration.  Since you are bringing machines up and down all the
>>>>>
>>>> time,
>>>
>>>> they begin to act more like programs and you wind up with boot scripts
>>>>>
>>>> that
>>>
>>>> give you a very predictable environment.  Nice.
>>>>>
>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>
>>>>  That
>>>
>>>> can make the ZK servers appear a bit less connected.  You have to plan
>>>>>
>>>> for
>>>
>>>> ConnectionLoss events.
>>>>>
>>>>> c) for highest reliability, I switched to large instances.  On
>>>>>
>>>> reflection,
>>>
>>>> I
>>>>> think that was helpful, but less important than I thought at the time.
>>>>>
>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>>
>>>> easily
>>>
>>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>>>> their configuration.  Then take down the instance you want to lose.  To
>>>>> increase, do a rolling update starting with the new instances to update
>>>>>
>>>> the
>>>
>>>> configuration to include all of the machines.  The rolling update should
>>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>>
>>>> instance
>>>
>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>>>> plus about 20 seconds for additional configuration).
>>>>>
>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>>>>
>>>> wrote:
>>>
>>>> Hello
>>>>>>
>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>
>>>>> system,
>>>>>
>>>>>> zookeeper is used to run a locking service and to generate unique
>>>>>> id's.
>>>>>> Currently, for testing purposes, I am only running one instance. Now,
>>>>>> I
>>>>>>
>>>>> need
>>>>>
>>>>>> to set up an ensemble to protect my system against crashes.
>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>>>>> the
>>>>>> data saved on the file system of an ec2 instance is lost if the
>>>>>>
>>>>> instance
>>>
>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>>>
>>>>> saves
>>>>>
>>>>>> snapshots of the in-memory data in the file system. Is that needed for
>>>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>>>
>>>>> case.
>>>>>
>>>>>> Additionally, ec2 brings the advantage that serves can be switch on
>>>>>> and
>>>>>>
>>>>> off
>>>>>
>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>
>>>>> server
>>>>>
>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>
>>

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

Yes. create/set/delete/... are really the issue (non-idempotent).

Satish Bhatti wrote:
> Well a bunch of the ConnectionLosses were for zookeeper.exists() calls.  I'm
> pretty sure dumb retry for those should suffice!
> 
> On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <ma...@yahoo-inc.com> wrote:
> 
>> Hi Satish,
>>
>>  Connectionloss is a little trickier than just retrying blindly. Please
>> read the following sections on this -
>>
>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>
>> And the programmers guide:
>>
>> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html
>>
>> To learn more about how to handle CONNECTIONLOSS. The idea is that that
>> blindly retrying would create problems with CONNECTIONLOSS, since a
>> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that
>> you were executing failed to execute. It might be possible that this
>> operation went through the servers.
>>
>> Since, this has been a constant source of confusion for everyone who starts
>> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care
>> of
>> this problem and programmers would not have to worry about CONNECTIONLOSS
>> handling.
>>
>> Thanks
>> mahadev
>>
>>
>>
>>
>> On 9/1/09 4:13 PM, "Satish Bhatti" <ct...@gmail.com> wrote:
>>
>>> I have recently started running on EC2 and am seeing quite a few
>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
>>> assume that eventually, if the shit truly hits the fan, I will get a
>>> SessionExpired?
>>> Satish
>>>
>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>>> We have used EC2 quite a bit for ZK.
>>>>
>>>> The basic lessons that I have learned include:
>>>>
>>>> a) EC2's biggest advantage after scaling and elasticity was conformity
>> of
>>>> configuration.  Since you are bringing machines up and down all the
>> time,
>>>> they begin to act more like programs and you wind up with boot scripts
>> that
>>>> give you a very predictable environment.  Nice.
>>>>
>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>  That
>>>> can make the ZK servers appear a bit less connected.  You have to plan
>> for
>>>> ConnectionLoss events.
>>>>
>>>> c) for highest reliability, I switched to large instances.  On
>> reflection,
>>>> I
>>>> think that was helpful, but less important than I thought at the time.
>>>>
>>>> d) increasing and decreasing cluster size is nearly painless and is
>> easily
>>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>>> their configuration.  Then take down the instance you want to lose.  To
>>>> increase, do a rolling update starting with the new instances to update
>> the
>>>> configuration to include all of the machines.  The rolling update should
>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>> cluster takes less than a minute which makes it comparable to EC2
>> instance
>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>>> plus about 20 seconds for additional configuration).
>>>>
>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>> wrote:
>>>>> Hello
>>>>>
>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>> system,
>>>>> zookeeper is used to run a locking service and to generate unique id's.
>>>>> Currently, for testing purposes, I am only running one instance. Now, I
>>>> need
>>>>> to set up an ensemble to protect my system against crashes.
>>>>> The ec2 services has some differences to a normal server farm. E.g. the
>>>>> data saved on the file system of an ec2 instance is lost if the
>> instance
>>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>> saves
>>>>> snapshots of the in-memory data in the file system. Is that needed for
>>>>> recovery? Logically, it would be much easier for me if this is not the
>>>> case.
>>>>> Additionally, ec2 brings the advantage that serves can be switch on and
>>>> off
>>>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>> server
>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>
>>>>> David
>>>>>
>>
>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

Well a bunch of the ConnectionLosses were for zookeeper.exists() calls.  I'm
pretty sure dumb retry for those should suffice!

On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <ma...@yahoo-inc.com> wrote:

> Hi Satish,
>
>  Connectionloss is a little trickier than just retrying blindly. Please
> read the following sections on this -
>
> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>
> And the programmers guide:
>
> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html
>
> To learn more about how to handle CONNECTIONLOSS. The idea is that that
> blindly retrying would create problems with CONNECTIONLOSS, since a
> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that
> you were executing failed to execute. It might be possible that this
> operation went through the servers.
>
> Since, this has been a constant source of confusion for everyone who starts
> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care
> of
> this problem and programmers would not have to worry about CONNECTIONLOSS
> handling.
>
> Thanks
> mahadev
>
>
>
>
> On 9/1/09 4:13 PM, "Satish Bhatti" <ct...@gmail.com> wrote:
>
> > I have recently started running on EC2 and am seeing quite a few
> > ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
> > assume that eventually, if the shit truly hits the fan, I will get a
> > SessionExpired?
> > Satish
> >
> > On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> We have used EC2 quite a bit for ZK.
> >>
> >> The basic lessons that I have learned include:
> >>
> >> a) EC2's biggest advantage after scaling and elasticity was conformity
> of
> >> configuration.  Since you are bringing machines up and down all the
> time,
> >> they begin to act more like programs and you wind up with boot scripts
> that
> >> give you a very predictable environment.  Nice.
> >>
> >> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>  That
> >> can make the ZK servers appear a bit less connected.  You have to plan
> for
> >> ConnectionLoss events.
> >>
> >> c) for highest reliability, I switched to large instances.  On
> reflection,
> >> I
> >> think that was helpful, but less important than I thought at the time.
> >>
> >> d) increasing and decreasing cluster size is nearly painless and is
> easily
> >> scriptable.  To decrease, do a rolling update on the survivors to update
> >> their configuration.  Then take down the instance you want to lose.  To
> >> increase, do a rolling update starting with the new instances to update
> the
> >> configuration to include all of the machines.  The rolling update should
> >> bounce each ZK with several seconds between each bounce.  Rescaling the
> >> cluster takes less than a minute which makes it comparable to EC2
> instance
> >> boot time (about 30 seconds for the Alestic ubuntu instance that we used
> >> plus about 20 seconds for additional configuration).
> >>
> >> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
> wrote:
> >>
> >>> Hello
> >>>
> >>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> >> system,
> >>> zookeeper is used to run a locking service and to generate unique id's.
> >>> Currently, for testing purposes, I am only running one instance. Now, I
> >> need
> >>> to set up an ensemble to protect my system against crashes.
> >>> The ec2 services has some differences to a normal server farm. E.g. the
> >>> data saved on the file system of an ec2 instance is lost if the
> instance
> >>> crashes. In the documentation of zookeeper, I have read that zookeeper
> >> saves
> >>> snapshots of the in-memory data in the file system. Is that needed for
> >>> recovery? Logically, it would be much easier for me if this is not the
> >> case.
> >>> Additionally, ec2 brings the advantage that serves can be switch on and
> >> off
> >>> dynamically dependent on the load, traffic, etc. Can this advantage be
> >>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
> >> server
> >>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> >>>
> >>> David
> >>>
> >>
>
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

What do your nodes  have in their logs during startup?   Are you sure  
you have them configured correctly?  Are the file ephemeral? Could  
they have disappeared on their own?

Sent from my iPhone

On Aug 11, 2010, at 12:10 AM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted,
>
> Thanks for the reply.  Here is what I did:
>
> [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> []
> zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,  
> msg0000002704, msg0000002706, msg0000002601, msg0000001849,  
> msg0000001847, msg0000002508, msg0000002609, msg0000001841,  
> msg0000002607, msg0000002606, msg0000002604, msg0000002809,  
> msg0000002817, msg0000001633, msg0000002812, msg0000002814,  
> msg0000002711, msg0000002815, msg0000002713, msg0000002716,  
> msg0000001772, msg0000002811, msg0000001635, msg0000001774,  
> msg0000002515, msg0000002610, msg0000001838, msg0000002517,  
> msg0000002612, msg0000002519, msg0000001973, msg0000001835,  
> msg0000001974, msg0000002619, msg0000001831, msg0000002510,  
> msg0000002512, msg0000002615, msg0000002614, msg0000002617,  
> msg0000002104, msg0000002106, msg0000001769, msg0000001768,  
> msg0000002828, msg0000002822, msg0000001760, msg0000002820,  
> msg0000001963, msg0000001961, msg0000002110, msg0000002118,  
> msg0000002900, msg0000002836, msg0000001757, msg0000002907,  
> msg0000001753, msg0000001752, msg0000001755, msg0000001952,  
> msg0000001958, msg0000001852, msg0000001956, msg0000001854,  
> msg0000002749, msg0000001608, msg0000001609, msg0000002747,  
> msg0000002882, msg0000001743, msg0000002888, msg0000001605,  
> msg0000002885, msg0000001487, msg0000001746, msg0000002330,  
> msg0000001749, msg0000001488, msg0000001489, msg0000001881,  
> msg0000001491, msg0000002890, msg0000001889, msg0000002758,  
> msg0000002241, msg0000002892, msg0000002852, msg0000002759,  
> msg0000002898, msg0000002850, msg0000001733, msg0000002751,  
> msg0000001739, msg0000002753, msg0000002756, msg0000002332,  
> msg0000001872, msg0000002233, msg0000001721, msg0000001627,  
> msg0000001720, msg0000001625, msg0000001628, msg0000001629,  
> msg0000001729, msg0000002350, msg0000001727, msg0000002352,  
> msg0000001622, msg0000001726, msg0000001623, msg0000001723,  
> msg0000001724, msg0000001621, msg0000002736, msg0000002738,  
> msg0000002363, msg0000001717, msg0000002878, msg0000002362,  
> msg0000002361, msg0000001611, msg0000001894, msg0000002357,  
> msg0000002218, msg0000002358, msg0000002355, msg0000001895,  
> msg0000002356, msg0000001898, msg0000002354, msg0000001996,  
> msg0000001990, msg0000002093, msg0000002880, msg0000002576,  
> msg0000002579, msg0000002267, msg0000002266, msg0000002366,  
> msg0000001901, msg0000002365, msg0000001903, msg0000001799,  
> msg0000001906, msg0000002368, msg0000001597, msg0000002679,  
> msg0000002166, msg0000001595, msg0000002481, msg0000002482,  
> msg0000002373, msg0000002374, msg0000002371, msg0000001599,  
> msg0000002773, msg0000002274, msg0000002275, msg0000002270,  
> msg0000002583, msg0000002271, msg0000002580, msg0000002067,  
> msg0000002277, msg0000002278, msg0000002376, msg0000002180,  
> msg0000002467, msg0000002378, msg0000002182, msg0000002377,  
> msg0000002184, msg0000002379, msg0000002187, msg0000002186,  
> msg0000002665, msg0000002666, msg0000002381, msg0000002382,  
> msg0000002661, msg0000002662, msg0000002663, msg0000002385,  
> msg0000002284, msg0000002766, msg0000002282, msg0000002190,  
> msg0000002599, msg0000002054, msg0000002596, msg0000002453,  
> msg0000002459, msg0000002457, msg0000002456, msg0000002191,  
> msg0000002652, msg0000002395, msg0000002650, msg0000002656,  
> msg0000002655, msg0000002189, msg0000002047, msg0000002658,  
> msg0000002659, msg0000002796, msg0000002250, msg0000002255,  
> msg0000002589, msg0000002257, msg0000002061, msg0000002064,  
> msg0000002585, msg0000002258, msg0000002587, msg0000002444,  
> msg0000002446, msg0000002447, msg0000002450, msg0000002646,  
> msg0000001501, msg0000002591, msg0000002592, msg0000001503,  
> msg0000001506, msg0000002260, msg0000002594, msg0000002262,  
> msg0000002263, msg0000002264, msg0000002590, msg0000002132,  
> msg0000002130, msg0000002530, msg0000002931, msg0000001559,  
> msg0000001808, msg0000002024, msg0000001553, msg0000002939,  
> msg0000002937, msg0000001556, msg0000002935, msg0000002933,  
> msg0000002140, msg0000001937, msg0000002143, msg0000002520,  
> msg0000002522, msg0000002429, msg0000002524, msg0000002920,  
> msg0000002035, msg0000001561, msg0000002134, msg0000002138,  
> msg0000002925, msg0000002151, msg0000002287, msg0000002555,  
> msg0000002010, msg0000002002, msg0000002290, msg0000001537,  
> msg0000002005, msg0000002147, msg0000002145, msg0000002698,  
> msg0000001592, msg0000001810, msg0000002690, msg0000002691,  
> msg0000001911, msg0000001910, msg0000002693, msg0000001812,  
> msg0000001817, msg0000001547, msg0000002012, msg0000002015,  
> msg0000002941, msg0000001688, msg0000002018, msg0000002684,  
> msg0000002944, msg0000001540, msg0000002686, msg0000001541,  
> msg0000002946, msg0000002688, msg0000001584, msg0000002948]
>
> [zk: localhost:2181(CONNECTED) 7] delete /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> Node does not exist: /xpe/queues/ 
> 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>
> When I performed the same operations on another node, none of those  
> nodes existed.
>
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
>
>> Can you provide some more information?  The output of some of the  
>> four
>> letter commands and a transcript of what you are doing would be very
>> helpful.
>>
>> Also, there is no way for znodes to exist on one node of a properly
>> operating ZK cluster and not on either of the other two.  Something  
>> has to
>> be wrong and I would vote for operator error (not to cast  
>> aspersions, it is
>> just that humans like you and *me* make more errors than ZK does).
>>
>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>  
>> wrote:
>>
>>> hi, All,
>>>
>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the  
>>> hosts,
>>> there are a number of nodes that I can "get" and "ls" using  
>>> zkCli.sh .
>>> However, when I tried to "delete" any of them, I got "Node does  
>>> not exist"
>>> error.    Those nodes do not exist on the other two hosts.
>>>
>>> Any idea how we should handle this type of errors and what might  
>>> have
>>> caused this problem?
>>>
>>> Dr Hao He
>>>
>>> XPE - the truly SOA platform
>>>
>>> he@softtouchit.com
>>> http://softtouchit.com
>>> http://itunes.com/apps/Scanmobile
>>>
>>>
>

Re: How to handle "Node does not exist" error?

Posted by Benjamin Reed <br...@yahoo-inc.com>.

i thought there was a jira about supporting embedded zookeeper. (i 
remember rejecting a patch to fix it. one of the problems is that we 
have a couple of places that do System.exit().) i can't seem to find it 
though.

one case that would be great for embedding is writing test cases, so i 
think it would be useful for that.

ben

On 08/12/2010 03:25 PM, Ted Dunning wrote:
> I am not saying that the API shouldn't support embedded ZK.
>
> I am just saying that it is almost always a bad idea.  It isn't that I am
> asking you to not do it, it is just that I am describing the experience I
> have had and that I have seen others have.  In a nutshell, embedding leads
> to problems and it isn't hard to see why.
>
> On Thu, Aug 12, 2010 at 3:02 PM, Vishal K<vi...@gmail.com>  wrote:
>
>    
>> 2. With respect to Ted's point about backward compatibility, I would
>> suggest
>> to take an approach of having an API to support embedded ZK instead of
>> asking users to not embed ZK.
>>
>>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.

Hi Dr Hao,

If you think this is not a configuration issue, then it would be a good idea
to open a jira. Thanks.

On Thu, Aug 12, 2010 at 8:42 PM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He <he...@softtouchit.com> wrote:
>
> > hi, Ted,
> >
> > I am a little bit confused here.  So, is the node inconsistency problem
> > that Vishal and I have seen here most likely caused by configurations or
> > embedding?
> >
> > If it is the former, I'd appreciate if you can point out where those
> silly
> > mistakes have been made and the correct way to embed ZK.
> >
>
> I think it is likely due to misconfiguration, but I don't know what the
> issue is exactly.  I think that another poster suggested that you ape the
> normal ZK startup process more closely.  That sounds good but it may be
> incompatible with your goals of integrating all configuration into a single
> XML file and not using the normal ZK configuration process.
>
> Your thought about forking ZK is a good one since there are calls to
> System.exit() that could wreak havoc.
>
>
>
> > Although I agree with your comments about the architectural issues that
> > embedding may lead to and we are aware of those,  I do not agree that
> > embedding will always lead to those issues.
>
>
> I agree that embedding won't always lead to those issues and your
> application is a reasonable counter-example.  As is common, I think that
> the
> exception proves the rule since your system is really just another way to
> launch an independent ZK cluster rather than an example of ZK being
> embedded
> into an application.
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.

In my case, I am pretty sure that the configuration was right. I will
reproduce it and post more info later. Thanks.

On Mon, Aug 16, 2010 at 1:08 PM, Patrick Hunt <ph...@apache.org> wrote:

> Try using the logs, stat command or JMX to verify that each ZK server is
> indeed a leader/follower as expected. You should have one leader and n-1
> followers. Verify that you don't have any "standalone" servers (this is the
> most frequent error I see - misconfiguration of a server such that it thinks
> it's a standalone server; I often see where a user has 3 standalone servers
> which they think is a single quorum, all of the servers will therefore be
> "inconsistent" to each other).
>
> Patrick
>
>
> On 08/12/2010 05:42 PM, Ted Dunning wrote:
>
>> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He<he...@softtouchit.com>  wrote:
>>
>>  hi, Ted,
>>>
>>> I am a little bit confused here.  So, is the node inconsistency problem
>>> that Vishal and I have seen here most likely caused by configurations or
>>> embedding?
>>>
>>> If it is the former, I'd appreciate if you can point out where those
>>> silly
>>> mistakes have been made and the correct way to embed ZK.
>>>
>>>
>> I think it is likely due to misconfiguration, but I don't know what the
>> issue is exactly.  I think that another poster suggested that you ape the
>> normal ZK startup process more closely.  That sounds good but it may be
>> incompatible with your goals of integrating all configuration into a
>> single
>> XML file and not using the normal ZK configuration process.
>>
>> Your thought about forking ZK is a good one since there are calls to
>> System.exit() that could wreak havoc.
>>
>>
>>
>>  Although I agree with your comments about the architectural issues that
>>> embedding may lead to and we are aware of those,  I do not agree that
>>> embedding will always lead to those issues.
>>>
>>
>>
>> I agree that embedding won't always lead to those issues and your
>> application is a reasonable counter-example.  As is common, I think that
>> the
>> exception proves the rule since your system is really just another way to
>> launch an independent ZK cluster rather than an example of ZK being
>> embedded
>> into an application.
>>
>>

Re: How to handle "Node does not exist" error?

Posted by Patrick Hunt <ph...@apache.org>.

Try using the logs, stat command or JMX to verify that each ZK server is 
indeed a leader/follower as expected. You should have one leader and n-1 
followers. Verify that you don't have any "standalone" servers (this is 
the most frequent error I see - misconfiguration of a server such that 
it thinks it's a standalone server; I often see where a user has 3 
standalone servers which they think is a single quorum, all of the 
servers will therefore be "inconsistent" to each other).

Patrick

On 08/12/2010 05:42 PM, Ted Dunning wrote:
> On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He<he...@softtouchit.com>  wrote:
>
>> hi, Ted,
>>
>> I am a little bit confused here.  So, is the node inconsistency problem
>> that Vishal and I have seen here most likely caused by configurations or
>> embedding?
>>
>> If it is the former, I'd appreciate if you can point out where those silly
>> mistakes have been made and the correct way to embed ZK.
>>
>
> I think it is likely due to misconfiguration, but I don't know what the
> issue is exactly.  I think that another poster suggested that you ape the
> normal ZK startup process more closely.  That sounds good but it may be
> incompatible with your goals of integrating all configuration into a single
> XML file and not using the normal ZK configuration process.
>
> Your thought about forking ZK is a good one since there are calls to
> System.exit() that could wreak havoc.
>
>
>
>> Although I agree with your comments about the architectural issues that
>> embedding may lead to and we are aware of those,  I do not agree that
>> embedding will always lead to those issues.
>
>
> I agree that embedding won't always lead to those issues and your
> application is a reasonable counter-example.  As is common, I think that the
> exception proves the rule since your system is really just another way to
> launch an independent ZK cluster rather than an example of ZK being embedded
> into an application.
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted,
>
> I am a little bit confused here.  So, is the node inconsistency problem
> that Vishal and I have seen here most likely caused by configurations or
> embedding?
>
> If it is the former, I'd appreciate if you can point out where those silly
> mistakes have been made and the correct way to embed ZK.
>

I think it is likely due to misconfiguration, but I don't know what the
issue is exactly.  I think that another poster suggested that you ape the
normal ZK startup process more closely.  That sounds good but it may be
incompatible with your goals of integrating all configuration into a single
XML file and not using the normal ZK configuration process.

Your thought about forking ZK is a good one since there are calls to
System.exit() that could wreak havoc.

> Although I agree with your comments about the architectural issues that
> embedding may lead to and we are aware of those,  I do not agree that
> embedding will always lead to those issues.

I agree that embedding won't always lead to those issues and your
application is a reasonable counter-example.  As is common, I think that the
exception proves the rule since your system is really just another way to
launch an independent ZK cluster rather than an example of ZK being embedded
into an application.

Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.

hi, Ted,

I am a little bit confused here.  So, is the node inconsistency problem that Vishal and I have seen here most likely caused by configurations or embedding?

If it is the former, I'd appreciate if you can point out where those silly mistakes have been made and the correct way to embed ZK.

Although I agree with your comments about the architectural issues that embedding may lead to and we are aware of those,  I do not agree that embedding will always lead to those issues.  In our case, we have a very simple orchestration framework that starts various services according to our XML configuration file since we need to start them in the right sequence.  Architecturally, this is just like writing a set of unix scripts to start ZK and other services. The only difference is that we happen to implement it in Java and XML.  In short, services coordinated by ZK do not embed ZK, the top level orchestration layer does. The only potential problem I see here is that we are running all those services in one JVM but this can be easily changed.  We are going to try to fork out ZK in a separate JVM that and see if it makes any difference.  

Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 13/08/2010, at 8:25 AM, Ted Dunning wrote:

> I am not saying that the API shouldn't support embedded ZK.
> 
> I am just saying that it is almost always a bad idea.  It isn't that I am
> asking you to not do it, it is just that I am describing the experience I
> have had and that I have seen others have.  In a nutshell, embedding leads
> to problems and it isn't hard to see why.
> 
> On Thu, Aug 12, 2010 at 3:02 PM, Vishal K <vi...@gmail.com> wrote:
> 
>> 2. With respect to Ted's point about backward compatibility, I would
>> suggest
>> to take an approach of having an API to support embedded ZK instead of
>> asking users to not embed ZK.
>>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

I am not saying that the API shouldn't support embedded ZK.

I am just saying that it is almost always a bad idea.  It isn't that I am
asking you to not do it, it is just that I am describing the experience I
have had and that I have seen others have.  In a nutshell, embedding leads
to problems and it isn't hard to see why.

On Thu, Aug 12, 2010 at 3:02 PM, Vishal K <vi...@gmail.com> wrote:

> 2. With respect to Ted's point about backward compatibility, I would
> suggest
> to take an approach of having an API to support embedded ZK instead of
> asking users to not embed ZK.
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.

Hi,

I don't intend to hijack Dr. Hao's email thread here, but I would like to
point out two things:

1. I  use embedded server as well. But I don't use any setters. We extend
QuorumPeerMain and call initializeAndRun() function. So we are doing pretty
much the same thing that QuorumPeerMain is doing. However, note that I am
seeing the same problem (in ZK 3.3.0) as Dr Hao is seeing. I haven't
debugged the cause yet. I assumed that this was my implementation error (and
it could still be). Nevertheless, this could turn out to be a bug as well.

2. With respect to Ted's point about backward compatibility, I would suggest
to take an approach of having an API to support embedded ZK instead of
asking users to not embed ZK.

-Vishal

On Thu, Aug 12, 2010 at 3:18 PM, Ted Dunning <te...@gmail.com> wrote:

> It doesn't.
>
> But running a ZK cluster that is incorrectly configured can cause this
> problem and configuring ZK using setters is likely to be subject to changes
> in what configuration is needed.  Thus, your style of code is more subject
> to decay over time than is nice.
>
> The rest of my comments detail *other* reasons why embedding a coordination
> layer in the code being coordinated is a bad idea.
>
> On Thu, Aug 12, 2010 at 6:33 AM, Vishal K <vi...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Can you explain why running ZK in embedded mode can cause znode
> > inconsistencies?
> > Thanks.
> >
> > -Vishal
> >
> > On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Try running the server in non-embedded mode.
> > >
> > > Also, you are assuming that you know everything about how to configure
> > the
> > > quorumPeer.  That is going to change and your code will break at that
> > time.
> > >  If you use a non-embedded cluster, this won't be a problem and you
> will
> > be
> > > able to upgrade ZK version without having to restart your service.
> > >
> > > My own opinion is that running an embedded ZK is a serious
> architectural
> > > error.  Since I don't know your particular situation, it might be
> > > different,
> > > but there is an inherent contradiction involved in running a
> coordination
> > > layer as part of the thing being coordinated.  Whatever your software
> > does,
> > > it isn't what ZK does.  As such, it is better to factor out the ZK
> > > functionality and make it completely stable.  That gives you a much
> > simpler
> > > world and will make it easier for you to trouble shoot your system.
>  The
> > > simple fact that you can't take down your service without affecting the
> > > reliability of your ZK layer makes this a very bad idea.
> > >
> > > The problems you are having now are only a preview of what this
> > > architectural error leads to.  There will be more problems and many of
> > them
> > > are likely to be more subtle and lead to service interruptions and lots
> > of
> > > wasted time.
> > >
> > > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
> > >
> > > > hi, Ted and Mahadev,
> > > >
> > > >
> > > > Here are some more details about my setup:
> > > >
> > > > I run zookeeper in the embedded mode with the following code:
> > > >
> > > >                                        quorumPeer = new QuorumPeer();
> > > >
> > > >  quorumPeer.setClientPort(getClientPort());
> > > >                                        quorumPeer.setTxnFactory(new
> > > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> > > >
> > > >  quorumPeer.setQuorumPeers(getServers());
> > > >
> > > >  quorumPeer.setElectionType(getElectionAlg());
> > > >
> >  quorumPeer.setMyid(getServerId());
> > > >
> > > >  quorumPeer.setTickTime(getTickTime());
> > > >
> > > >  quorumPeer.setInitLimit(getInitLimit());
> > > >
> > > >  quorumPeer.setSyncLimit(getSyncLimit());
> > > >
> > > >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> > > >
> > > >  quorumPeer.setCnxnFactory(cnxnFactory);
> > > >                                        quorumPeer.start();
> > > >
> > > >
> > > > The configuration values are read from the following XML document for
> > > > server 1:
> > > >
> > > > <cluster tickTime="1000" initLimit="10" syncLimit="5"
> clientPort="2181"
> > > > serverId="1">
> > > >                  <member id="1" host="192.168.2.6:2888:3888"/>
> > > >                  <member id="2" host="192.168.2.3:2888:3888"/>
> > > >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > > > </cluster>
> > > >
> > > >
> > > > The other servers have the same configurations except their ids being
> > > > changed to 2 and 3.
> > > >
> > > > The error occurred on server 3 when I batch loaded some messages to
> > > server
> > > > 1.  However, this error does not always happen.  I am not sure
> exactly
> > > what
> > > > trigged this error yet.
> > > >
> > > > I also performed the "stat" operation on one of the "No exit" node
> and
> > > got:
> > > >
> > > > stat
> > > >
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > > > Exception in thread "main" java.lang.NullPointerException
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > > > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> > > >
> > > >
> > > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL
> and
> > > are
> > > > deleted by the last server who has read them.
> > > >
> > > > If I remove the troubled server's zookeeper log directory and restart
> > the
> > > > server, then everything is ok.
> > > >
> > > > I will try to get the nc result next time I see this problem.
> > > >
> > > >
> > > > Dr Hao He
> > > >
> > > > XPE - the truly SOA platform
> > > >
> > > > he@softtouchit.com
> > > > http://softtouchit.com
> > > > http://itunes.com/apps/Scanmobile
> > > >
> > > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> > > >
> > > > > HI Dr Hao,
> > > > >  Can you please post the configuration of all the 3 zookeeper
> > servers?
> > > I
> > > > > suspect it might be misconfigured clusters and they might not
> belong
> > to
> > > > the
> > > > > same ensemble.
> > > > >
> > > > > Just to be clear:
> > > > >
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > > > >
> > > > > And other such nodes exist on one of the zookeeper servers and the
> > same
> > > > node
> > > > > does not exist on other servers?
> > > > >
> > > > > Also, as ted pointed out, can you please post the output of echo
> > ³stat²
> > > |
> > > > nc
> > > > > localhost 2181 (on all the 3 servers) to the list?
> > > > >
> > > > > Thanks
> > > > > mahadev
> > > > >
> > > > >
> > > > >
> > > > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > > > >
> > > > >> hi, Ted,
> > > > >>
> > > > >> Thanks for the reply.  Here is what I did:
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 0] ls
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> []
> > > > >> zk: localhost:2181(CONNECTED) 1] ls
> > > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > > > msg0000002704,
> > > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > > > msg0000002508,
> > > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > > > msg0000002604,
> > > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > > > msg0000002814,
> > > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > > > msg0000001772,
> > > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > > > msg0000002610,
> > > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > > > msg0000001973,
> > > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > > > msg0000002510,
> > > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > > > msg0000002104,
> > > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > > > msg0000002822,
> > > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > > > msg0000002110,
> > > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > > > msg0000002907,
> > > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > > > msg0000001958,
> > > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > > > msg0000001608,
> > > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > > > msg0000002888,
> > > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > > > msg0000002330,
> > > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > > > msg0000001491,
> > > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > > > msg0000002892,
> > > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > > > msg0000001733,
> > > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > > > msg0000002332,
> > > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > > > msg0000001720,
> > > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > > > msg0000002350,
> > > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > > > msg0000001623,
> > > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > > > msg0000002738,
> > > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > > > msg0000002361,
> > > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > > > msg0000002358,
> > > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > > > msg0000002354,
> > > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > > > msg0000002576,
> > > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > > > msg0000001901,
> > > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > > > msg0000002368,
> > > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > > > msg0000002481,
> > > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > > > msg0000001599,
> > > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > > > msg0000002583,
> > > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > > > msg0000002278,
> > > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > > > msg0000002182,
> > > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > > > msg0000002186,
> > > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > > > msg0000002661,
> > > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > > > msg0000002766,
> > > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > > > msg0000002596,
> > > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > > > msg0000002191,
> > > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > > > msg0000002655,
> > > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > > > msg0000002796,
> > > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > > > msg0000002061,
> > > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > > > msg0000002444,
> > > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > > > msg0000001501,
> > > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > > > msg0000002260,
> > > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > > > msg0000002590,
> > > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > > > msg0000001559,
> > > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > > > msg0000002937,
> > > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > > > msg0000001937,
> > > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > > > msg0000002524,
> > > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > > > msg0000002138,
> > > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > > > msg0000002010,
> > > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > > > msg0000002147,
> > > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > > > msg0000002690,
> > > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > > > msg0000001812,
> > > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > > > msg0000002941,
> > > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > > > msg0000001540,
> > > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > > > msg0000001584,
> > > > >> msg0000002948]
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 7] delete
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> Node does not exist:
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >>
> > > > >> When I performed the same operations on another node, none of
> those
> > > > nodes
> > > > >> existed.
> > > > >>
> > > > >>
> > > > >> Dr Hao He
> > > > >>
> > > > >> XPE - the truly SOA platform
> > > > >>
> > > > >> he@softtouchit.com
> > > > >> http://softtouchit.com
> > > > >> http://itunes.com/apps/Scanmobile
> > > > >>
> > > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > > > >>
> > > > >>> Can you provide some more information?  The output of some of the
> > > four
> > > > >>> letter commands and a transcript of what you are doing would be
> > very
> > > > >>> helpful.
> > > > >>>
> > > > >>> Also, there is no way for znodes to exist on one node of a
> properly
> > > > >>> operating ZK cluster and not on either of the other two.
>  Something
> > > has
> > > > to
> > > > >>> be wrong and I would vote for operator error (not to cast
> > aspersions,
> > > > it is
> > > > >>> just that humans like you and *me* make more errors than ZK
> does).
> > > > >>>
> > > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > > > wrote:
> > > > >>>
> > > > >>>> hi, All,
> > > > >>>>
> > > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> > > hosts,
> > > > >>>> there are a number of nodes that I can "get" and "ls" using
> > zkCli.sh
> > > .
> > > > >>>> However, when I tried to "delete" any of them, I got "Node does
> > not
> > > > exist"
> > > > >>>> error.    Those nodes do not exist on the other two hosts.
> > > > >>>>
> > > > >>>> Any idea how we should handle this type of errors and what might
> > > have
> > > > >>>> caused this problem?
> > > > >>>>
> > > > >>>> Dr Hao He
> > > > >>>>
> > > > >>>> XPE - the truly SOA platform
> > > > >>>>
> > > > >>>> he@softtouchit.com
> > > > >>>> http://softtouchit.com
> > > > >>>> http://itunes.com/apps/Scanmobile
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

It doesn't.

But running a ZK cluster that is incorrectly configured can cause this
problem and configuring ZK using setters is likely to be subject to changes
in what configuration is needed.  Thus, your style of code is more subject
to decay over time than is nice.

The rest of my comments detail *other* reasons why embedding a coordination
layer in the code being coordinated is a bad idea.

On Thu, Aug 12, 2010 at 6:33 AM, Vishal K <vi...@gmail.com> wrote:

> Hi Ted,
>
> Can you explain why running ZK in embedded mode can cause znode
> inconsistencies?
> Thanks.
>
> -Vishal
>
> On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Try running the server in non-embedded mode.
> >
> > Also, you are assuming that you know everything about how to configure
> the
> > quorumPeer.  That is going to change and your code will break at that
> time.
> >  If you use a non-embedded cluster, this won't be a problem and you will
> be
> > able to upgrade ZK version without having to restart your service.
> >
> > My own opinion is that running an embedded ZK is a serious architectural
> > error.  Since I don't know your particular situation, it might be
> > different,
> > but there is an inherent contradiction involved in running a coordination
> > layer as part of the thing being coordinated.  Whatever your software
> does,
> > it isn't what ZK does.  As such, it is better to factor out the ZK
> > functionality and make it completely stable.  That gives you a much
> simpler
> > world and will make it easier for you to trouble shoot your system.  The
> > simple fact that you can't take down your service without affecting the
> > reliability of your ZK layer makes this a very bad idea.
> >
> > The problems you are having now are only a preview of what this
> > architectural error leads to.  There will be more problems and many of
> them
> > are likely to be more subtle and lead to service interruptions and lots
> of
> > wasted time.
> >
> > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
> >
> > > hi, Ted and Mahadev,
> > >
> > >
> > > Here are some more details about my setup:
> > >
> > > I run zookeeper in the embedded mode with the following code:
> > >
> > >                                        quorumPeer = new QuorumPeer();
> > >
> > >  quorumPeer.setClientPort(getClientPort());
> > >                                        quorumPeer.setTxnFactory(new
> > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> > >
> > >  quorumPeer.setQuorumPeers(getServers());
> > >
> > >  quorumPeer.setElectionType(getElectionAlg());
> > >
>  quorumPeer.setMyid(getServerId());
> > >
> > >  quorumPeer.setTickTime(getTickTime());
> > >
> > >  quorumPeer.setInitLimit(getInitLimit());
> > >
> > >  quorumPeer.setSyncLimit(getSyncLimit());
> > >
> > >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> > >
> > >  quorumPeer.setCnxnFactory(cnxnFactory);
> > >                                        quorumPeer.start();
> > >
> > >
> > > The configuration values are read from the following XML document for
> > > server 1:
> > >
> > > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> > > serverId="1">
> > >                  <member id="1" host="192.168.2.6:2888:3888"/>
> > >                  <member id="2" host="192.168.2.3:2888:3888"/>
> > >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > > </cluster>
> > >
> > >
> > > The other servers have the same configurations except their ids being
> > > changed to 2 and 3.
> > >
> > > The error occurred on server 3 when I batch loaded some messages to
> > server
> > > 1.  However, this error does not always happen.  I am not sure exactly
> > what
> > > trigged this error yet.
> > >
> > > I also performed the "stat" operation on one of the "No exit" node and
> > got:
> > >
> > > stat
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > > Exception in thread "main" java.lang.NullPointerException
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> > >        at
> > > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> > >        at
> org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> > >        at
> org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> > >
> > >
> > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and
> > are
> > > deleted by the last server who has read them.
> > >
> > > If I remove the troubled server's zookeeper log directory and restart
> the
> > > server, then everything is ok.
> > >
> > > I will try to get the nc result next time I see this problem.
> > >
> > >
> > > Dr Hao He
> > >
> > > XPE - the truly SOA platform
> > >
> > > he@softtouchit.com
> > > http://softtouchit.com
> > > http://itunes.com/apps/Scanmobile
> > >
> > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> > >
> > > > HI Dr Hao,
> > > >  Can you please post the configuration of all the 3 zookeeper
> servers?
> > I
> > > > suspect it might be misconfigured clusters and they might not belong
> to
> > > the
> > > > same ensemble.
> > > >
> > > > Just to be clear:
> > > >
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > > >
> > > > And other such nodes exist on one of the zookeeper servers and the
> same
> > > node
> > > > does not exist on other servers?
> > > >
> > > > Also, as ted pointed out, can you please post the output of echo
> ³stat²
> > |
> > > nc
> > > > localhost 2181 (on all the 3 servers) to the list?
> > > >
> > > > Thanks
> > > > mahadev
> > > >
> > > >
> > > >
> > > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > > >
> > > >> hi, Ted,
> > > >>
> > > >> Thanks for the reply.  Here is what I did:
> > > >>
> > > >> [zk: localhost:2181(CONNECTED) 0] ls
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >> []
> > > >> zk: localhost:2181(CONNECTED) 1] ls
> > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > > msg0000002704,
> > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > > msg0000002508,
> > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > > msg0000002604,
> > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > > msg0000002814,
> > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > > msg0000001772,
> > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > > msg0000002610,
> > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > > msg0000001973,
> > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > > msg0000002510,
> > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > > msg0000002104,
> > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > > msg0000002822,
> > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > > msg0000002110,
> > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > > msg0000002907,
> > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > > msg0000001958,
> > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > > msg0000001608,
> > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > > msg0000002888,
> > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > > msg0000002330,
> > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > > msg0000001491,
> > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > > msg0000002892,
> > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > > msg0000001733,
> > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > > msg0000002332,
> > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > > msg0000001720,
> > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > > msg0000002350,
> > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > > msg0000001623,
> > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > > msg0000002738,
> > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > > msg0000002361,
> > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > > msg0000002358,
> > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > > msg0000002354,
> > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > > msg0000002576,
> > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > > msg0000001901,
> > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > > msg0000002368,
> > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > > msg0000002481,
> > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > > msg0000001599,
> > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > > msg0000002583,
> > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > > msg0000002278,
> > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > > msg0000002182,
> > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > > msg0000002186,
> > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > > msg0000002661,
> > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > > msg0000002766,
> > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > > msg0000002596,
> > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > > msg0000002191,
> > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > > msg0000002655,
> > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > > msg0000002796,
> > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > > msg0000002061,
> > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > > msg0000002444,
> > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > > msg0000001501,
> > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > > msg0000002260,
> > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > > msg0000002590,
> > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > > msg0000001559,
> > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > > msg0000002937,
> > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > > msg0000001937,
> > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > > msg0000002524,
> > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > > msg0000002138,
> > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > > msg0000002010,
> > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > > msg0000002147,
> > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > > msg0000002690,
> > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > > msg0000001812,
> > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > > msg0000002941,
> > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > > msg0000001540,
> > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > > msg0000001584,
> > > >> msg0000002948]
> > > >>
> > > >> [zk: localhost:2181(CONNECTED) 7] delete
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >> Node does not exist:
> > > >>
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > >>
> > > >> When I performed the same operations on another node, none of those
> > > nodes
> > > >> existed.
> > > >>
> > > >>
> > > >> Dr Hao He
> > > >>
> > > >> XPE - the truly SOA platform
> > > >>
> > > >> he@softtouchit.com
> > > >> http://softtouchit.com
> > > >> http://itunes.com/apps/Scanmobile
> > > >>
> > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > > >>
> > > >>> Can you provide some more information?  The output of some of the
> > four
> > > >>> letter commands and a transcript of what you are doing would be
> very
> > > >>> helpful.
> > > >>>
> > > >>> Also, there is no way for znodes to exist on one node of a properly
> > > >>> operating ZK cluster and not on either of the other two.  Something
> > has
> > > to
> > > >>> be wrong and I would vote for operator error (not to cast
> aspersions,
> > > it is
> > > >>> just that humans like you and *me* make more errors than ZK does).
> > > >>>
> > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > > wrote:
> > > >>>
> > > >>>> hi, All,
> > > >>>>
> > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> > hosts,
> > > >>>> there are a number of nodes that I can "get" and "ls" using
> zkCli.sh
> > .
> > > >>>> However, when I tried to "delete" any of them, I got "Node does
> not
> > > exist"
> > > >>>> error.    Those nodes do not exist on the other two hosts.
> > > >>>>
> > > >>>> Any idea how we should handle this type of errors and what might
> > have
> > > >>>> caused this problem?
> > > >>>>
> > > >>>> Dr Hao He
> > > >>>>
> > > >>>> XPE - the truly SOA platform
> > > >>>>
> > > >>>> he@softtouchit.com
> > > >>>> http://softtouchit.com
> > > >>>> http://itunes.com/apps/Scanmobile
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Vishal K <vi...@gmail.com>.

Hi Ted,

Can you explain why running ZK in embedded mode can cause znode
inconsistencies?
Thanks.

-Vishal

On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <te...@gmail.com> wrote:

> Try running the server in non-embedded mode.
>
> Also, you are assuming that you know everything about how to configure the
> quorumPeer.  That is going to change and your code will break at that time.
>  If you use a non-embedded cluster, this won't be a problem and you will be
> able to upgrade ZK version without having to restart your service.
>
> My own opinion is that running an embedded ZK is a serious architectural
> error.  Since I don't know your particular situation, it might be
> different,
> but there is an inherent contradiction involved in running a coordination
> layer as part of the thing being coordinated.  Whatever your software does,
> it isn't what ZK does.  As such, it is better to factor out the ZK
> functionality and make it completely stable.  That gives you a much simpler
> world and will make it easier for you to trouble shoot your system.  The
> simple fact that you can't take down your service without affecting the
> reliability of your ZK layer makes this a very bad idea.
>
> The problems you are having now are only a preview of what this
> architectural error leads to.  There will be more problems and many of them
> are likely to be more subtle and lead to service interruptions and lots of
> wasted time.
>
> On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:
>
> > hi, Ted and Mahadev,
> >
> >
> > Here are some more details about my setup:
> >
> > I run zookeeper in the embedded mode with the following code:
> >
> >                                        quorumPeer = new QuorumPeer();
> >
> >  quorumPeer.setClientPort(getClientPort());
> >                                        quorumPeer.setTxnFactory(new
> > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> >
> >  quorumPeer.setQuorumPeers(getServers());
> >
> >  quorumPeer.setElectionType(getElectionAlg());
> >                                        quorumPeer.setMyid(getServerId());
> >
> >  quorumPeer.setTickTime(getTickTime());
> >
> >  quorumPeer.setInitLimit(getInitLimit());
> >
> >  quorumPeer.setSyncLimit(getSyncLimit());
> >
> >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> >
> >  quorumPeer.setCnxnFactory(cnxnFactory);
> >                                        quorumPeer.start();
> >
> >
> > The configuration values are read from the following XML document for
> > server 1:
> >
> > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> > serverId="1">
> >                  <member id="1" host="192.168.2.6:2888:3888"/>
> >                  <member id="2" host="192.168.2.3:2888:3888"/>
> >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > </cluster>
> >
> >
> > The other servers have the same configurations except their ids being
> > changed to 2 and 3.
> >
> > The error occurred on server 3 when I batch loaded some messages to
> server
> > 1.  However, this error does not always happen.  I am not sure exactly
> what
> > trigged this error yet.
> >
> > I also performed the "stat" operation on one of the "No exit" node and
> got:
> >
> > stat
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > Exception in thread "main" java.lang.NullPointerException
> >        at
> > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> >        at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> >        at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> >
> >
> > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and
> are
> > deleted by the last server who has read them.
> >
> > If I remove the troubled server's zookeeper log directory and restart the
> > server, then everything is ok.
> >
> > I will try to get the nc result next time I see this problem.
> >
> >
> > Dr Hao He
> >
> > XPE - the truly SOA platform
> >
> > he@softtouchit.com
> > http://softtouchit.com
> > http://itunes.com/apps/Scanmobile
> >
> > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> >
> > > HI Dr Hao,
> > >  Can you please post the configuration of all the 3 zookeeper servers?
> I
> > > suspect it might be misconfigured clusters and they might not belong to
> > the
> > > same ensemble.
> > >
> > > Just to be clear:
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > >
> > > And other such nodes exist on one of the zookeeper servers and the same
> > node
> > > does not exist on other servers?
> > >
> > > Also, as ted pointed out, can you please post the output of echo ³stat²
> |
> > nc
> > > localhost 2181 (on all the 3 servers) to the list?
> > >
> > > Thanks
> > > mahadev
> > >
> > >
> > >
> > > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> > >
> > >> hi, Ted,
> > >>
> > >> Thanks for the reply.  Here is what I did:
> > >>
> > >> [zk: localhost:2181(CONNECTED) 0] ls
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> []
> > >> zk: localhost:2181(CONNECTED) 1] ls
> > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > msg0000002704,
> > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > msg0000002508,
> > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > msg0000002604,
> > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > msg0000002814,
> > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > msg0000001772,
> > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > msg0000002610,
> > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > msg0000001973,
> > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > msg0000002510,
> > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > msg0000002104,
> > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > msg0000002822,
> > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > msg0000002110,
> > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > msg0000002907,
> > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > msg0000001958,
> > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > msg0000001608,
> > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > msg0000002888,
> > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > msg0000002330,
> > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > msg0000001491,
> > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > msg0000002892,
> > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > msg0000001733,
> > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > msg0000002332,
> > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > msg0000001720,
> > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > msg0000002350,
> > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > msg0000001623,
> > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > msg0000002738,
> > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > msg0000002361,
> > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > msg0000002358,
> > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > msg0000002354,
> > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > msg0000002576,
> > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > msg0000001901,
> > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > msg0000002368,
> > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > msg0000002481,
> > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > msg0000001599,
> > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > msg0000002583,
> > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > msg0000002278,
> > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > msg0000002182,
> > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > msg0000002186,
> > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > msg0000002661,
> > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > msg0000002766,
> > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > msg0000002596,
> > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > msg0000002191,
> > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > msg0000002655,
> > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > msg0000002796,
> > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > msg0000002061,
> > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > msg0000002444,
> > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > msg0000001501,
> > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > msg0000002260,
> > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > msg0000002590,
> > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > msg0000001559,
> > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > msg0000002937,
> > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > msg0000001937,
> > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > msg0000002524,
> > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > msg0000002138,
> > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > msg0000002010,
> > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > msg0000002147,
> > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > msg0000002690,
> > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > msg0000001812,
> > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > msg0000002941,
> > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > msg0000001540,
> > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > msg0000001584,
> > >> msg0000002948]
> > >>
> > >> [zk: localhost:2181(CONNECTED) 7] delete
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> Node does not exist:
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >>
> > >> When I performed the same operations on another node, none of those
> > nodes
> > >> existed.
> > >>
> > >>
> > >> Dr Hao He
> > >>
> > >> XPE - the truly SOA platform
> > >>
> > >> he@softtouchit.com
> > >> http://softtouchit.com
> > >> http://itunes.com/apps/Scanmobile
> > >>
> > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > >>
> > >>> Can you provide some more information?  The output of some of the
> four
> > >>> letter commands and a transcript of what you are doing would be very
> > >>> helpful.
> > >>>
> > >>> Also, there is no way for znodes to exist on one node of a properly
> > >>> operating ZK cluster and not on either of the other two.  Something
> has
> > to
> > >>> be wrong and I would vote for operator error (not to cast aspersions,
> > it is
> > >>> just that humans like you and *me* make more errors than ZK does).
> > >>>
> > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> > wrote:
> > >>>
> > >>>> hi, All,
> > >>>>
> > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> hosts,
> > >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh
> .
> > >>>> However, when I tried to "delete" any of them, I got "Node does not
> > exist"
> > >>>> error.    Those nodes do not exist on the other two hosts.
> > >>>>
> > >>>> Any idea how we should handle this type of errors and what might
> have
> > >>>> caused this problem?
> > >>>>
> > >>>> Dr Hao He
> > >>>>
> > >>>> XPE - the truly SOA platform
> > >>>>
> > >>>> he@softtouchit.com
> > >>>> http://softtouchit.com
> > >>>> http://itunes.com/apps/Scanmobile
> > >>>>
> > >>>>
> > >>
> > >>
> > >
> > >
> >
> >
>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

Try running the server in non-embedded mode.

Also, you are assuming that you know everything about how to configure the
quorumPeer.  That is going to change and your code will break at that time.
 If you use a non-embedded cluster, this won't be a problem and you will be
able to upgrade ZK version without having to restart your service.

My own opinion is that running an embedded ZK is a serious architectural
error.  Since I don't know your particular situation, it might be different,
but there is an inherent contradiction involved in running a coordination
layer as part of the thing being coordinated.  Whatever your software does,
it isn't what ZK does.  As such, it is better to factor out the ZK
functionality and make it completely stable.  That gives you a much simpler
world and will make it easier for you to trouble shoot your system.  The
simple fact that you can't take down your service without affecting the
reliability of your ZK layer makes this a very bad idea.

The problems you are having now are only a preview of what this
architectural error leads to.  There will be more problems and many of them
are likely to be more subtle and lead to service interruptions and lots of
wasted time.

On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, Ted and Mahadev,
>
>
> Here are some more details about my setup:
>
> I run zookeeper in the embedded mode with the following code:
>
>                                        quorumPeer = new QuorumPeer();
>
>  quorumPeer.setClientPort(getClientPort());
>                                        quorumPeer.setTxnFactory(new
> FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
>
>  quorumPeer.setQuorumPeers(getServers());
>
>  quorumPeer.setElectionType(getElectionAlg());
>                                        quorumPeer.setMyid(getServerId());
>
>  quorumPeer.setTickTime(getTickTime());
>
>  quorumPeer.setInitLimit(getInitLimit());
>
>  quorumPeer.setSyncLimit(getSyncLimit());
>
>  quorumPeer.setQuorumVerifier(getQuorumVerifier());
>
>  quorumPeer.setCnxnFactory(cnxnFactory);
>                                        quorumPeer.start();
>
>
> The configuration values are read from the following XML document for
> server 1:
>
> <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> serverId="1">
>                  <member id="1" host="192.168.2.6:2888:3888"/>
>                  <member id="2" host="192.168.2.3:2888:3888"/>
>                  <member id="3" host="192.168.2.4:2888:3888"/>
> </cluster>
>
>
> The other servers have the same configurations except their ids being
> changed to 2 and 3.
>
> The error occurred on server 3 when I batch loaded some messages to server
> 1.  However, this error does not always happen.  I am not sure exactly what
> trigged this error yet.
>
> I also performed the "stat" operation on one of the "No exit" node and got:
>
> stat
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
>        at
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
>        at
> org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
>        at
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
>        at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
>        at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
>
>
> Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are
> deleted by the last server who has read them.
>
> If I remove the troubled server's zookeeper log directory and restart the
> server, then everything is ok.
>
> I will try to get the nc result next time I see this problem.
>
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
> On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
>
> > HI Dr Hao,
> >  Can you please post the configuration of all the 3 zookeeper servers? I
> > suspect it might be misconfigured clusters and they might not belong to
> the
> > same ensemble.
> >
> > Just to be clear:
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> >
> > And other such nodes exist on one of the zookeeper servers and the same
> node
> > does not exist on other servers?
> >
> > Also, as ted pointed out, can you please post the output of echo ³stat² |
> nc
> > localhost 2181 (on all the 3 servers) to the list?
> >
> > Thanks
> > mahadev
> >
> >
> >
> > On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> >
> >> hi, Ted,
> >>
> >> Thanks for the reply.  Here is what I did:
> >>
> >> [zk: localhost:2181(CONNECTED) 0] ls
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >> []
> >> zk: localhost:2181(CONNECTED) 1] ls
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> msg0000002704,
> >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> msg0000002508,
> >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> msg0000002604,
> >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> msg0000002814,
> >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> msg0000001772,
> >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> msg0000002610,
> >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> msg0000001973,
> >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> msg0000002510,
> >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> msg0000002104,
> >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> msg0000002822,
> >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> msg0000002110,
> >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> msg0000002907,
> >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> msg0000001958,
> >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> msg0000001608,
> >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> msg0000002888,
> >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> msg0000002330,
> >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> msg0000001491,
> >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> msg0000002892,
> >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> msg0000001733,
> >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> msg0000002332,
> >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> msg0000001720,
> >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> msg0000002350,
> >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> msg0000001623,
> >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> msg0000002738,
> >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> msg0000002361,
> >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> msg0000002358,
> >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> msg0000002354,
> >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> msg0000002576,
> >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> msg0000001901,
> >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> msg0000002368,
> >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> msg0000002481,
> >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> msg0000001599,
> >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> msg0000002583,
> >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> msg0000002278,
> >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> msg0000002182,
> >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> msg0000002186,
> >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> msg0000002661,
> >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> msg0000002766,
> >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> msg0000002596,
> >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> msg0000002191,
> >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> msg0000002655,
> >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> msg0000002796,
> >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> msg0000002061,
> >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> msg0000002444,
> >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> msg0000001501,
> >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> msg0000002260,
> >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> msg0000002590,
> >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> msg0000001559,
> >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> msg0000002937,
> >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> msg0000001937,
> >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> msg0000002524,
> >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> msg0000002138,
> >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> msg0000002010,
> >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> msg0000002147,
> >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> msg0000002690,
> >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> msg0000001812,
> >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> msg0000002941,
> >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> msg0000001540,
> >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> msg0000001584,
> >> msg0000002948]
> >>
> >> [zk: localhost:2181(CONNECTED) 7] delete
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >> Node does not exist:
> >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> >>
> >> When I performed the same operations on another node, none of those
> nodes
> >> existed.
> >>
> >>
> >> Dr Hao He
> >>
> >> XPE - the truly SOA platform
> >>
> >> he@softtouchit.com
> >> http://softtouchit.com
> >> http://itunes.com/apps/Scanmobile
> >>
> >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> >>
> >>> Can you provide some more information?  The output of some of the four
> >>> letter commands and a transcript of what you are doing would be very
> >>> helpful.
> >>>
> >>> Also, there is no way for znodes to exist on one node of a properly
> >>> operating ZK cluster and not on either of the other two.  Something has
> to
> >>> be wrong and I would vote for operator error (not to cast aspersions,
> it is
> >>> just that humans like you and *me* make more errors than ZK does).
> >>>
> >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com>
> wrote:
> >>>
> >>>> hi, All,
> >>>>
> >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
> >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
> >>>> However, when I tried to "delete" any of them, I got "Node does not
> exist"
> >>>> error.    Those nodes do not exist on the other two hosts.
> >>>>
> >>>> Any idea how we should handle this type of errors and what might have
> >>>> caused this problem?
> >>>>
> >>>> Dr Hao He
> >>>>
> >>>> XPE - the truly SOA platform
> >>>>
> >>>> he@softtouchit.com
> >>>> http://softtouchit.com
> >>>> http://itunes.com/apps/Scanmobile
> >>>>
> >>>>
> >>
> >>
> >
> >
>
>

Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.

hi, Ted and Mahadev,


Here are some more details about my setup:

I run zookeeper in the embedded mode with the following code:

					quorumPeer = new QuorumPeer();
					quorumPeer.setClientPort(getClientPort());
					quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
					quorumPeer.setQuorumPeers(getServers());
					quorumPeer.setElectionType(getElectionAlg());
					quorumPeer.setMyid(getServerId());
					quorumPeer.setTickTime(getTickTime());
					quorumPeer.setInitLimit(getInitLimit());
					quorumPeer.setSyncLimit(getSyncLimit());
					quorumPeer.setQuorumVerifier(getQuorumVerifier());
					quorumPeer.setCnxnFactory(cnxnFactory);
					quorumPeer.start();


The configuration values are read from the following XML document for server 1:

<cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181" serverId="1">
		  <member id="1" host="192.168.2.6:2888:3888"/>
                  <member id="2" host="192.168.2.3:2888:3888"/> 
               	  <member id="3" host="192.168.2.4:2888:3888"/>
</cluster>


The other servers have the same configurations except their ids being changed to 2 and 3.

The error occurred on server 3 when I batch loaded some messages to server 1.  However, this error does not always happen.  I am not sure exactly what trigged this error yet.

I also performed the "stat" operation on one of the "No exit" node and got:

stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583   
Exception in thread "main" java.lang.NullPointerException
	at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
[xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh 


Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are deleted by the last server who has read them. 

If I remove the troubled server's zookeeper log directory and restart the server, then everything is ok.

I will try to get the nc result next time I see this problem.


Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:

> HI Dr Hao,
>  Can you please post the configuration of all the 3 zookeeper servers? I
> suspect it might be misconfigured clusters and they might not belong to the
> same ensemble.
> 
> Just to be clear:
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> 
> And other such nodes exist on one of the zookeeper servers and the same node
> does not exist on other servers?
> 
> Also, as ted pointed out, can you please post the output of echo ³stat² | nc
> localhost 2181 (on all the 3 servers) to the list?
> 
> Thanks
> mahadev
> 
> 
> 
> On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:
> 
>> hi, Ted,
>> 
>> Thanks for the reply.  Here is what I did:
>> 
>> [zk: localhost:2181(CONNECTED) 0] ls
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> []
>> zk: localhost:2181(CONNECTED) 1] ls
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
>> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704,
>> msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508,
>> msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604,
>> msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814,
>> msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772,
>> msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610,
>> msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973,
>> msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510,
>> msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104,
>> msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822,
>> msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110,
>> msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907,
>> msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958,
>> msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608,
>> msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888,
>> msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330,
>> msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491,
>> msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892,
>> msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733,
>> msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332,
>> msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720,
>> msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350,
>> msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623,
>> msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738,
>> msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361,
>> msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358,
>> msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354,
>> msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576,
>> msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901,
>> msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368,
>> msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481,
>> msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599,
>> msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583,
>> msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278,
>> msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182,
>> msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186,
>> msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661,
>> msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766,
>> msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596,
>> msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191,
>> msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655,
>> msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796,
>> msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061,
>> msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444,
>> msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501,
>> msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260,
>> msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590,
>> msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559,
>> msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937,
>> msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937,
>> msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524,
>> msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138,
>> msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010,
>> msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147,
>> msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690,
>> msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812,
>> msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941,
>> msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540,
>> msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584,
>> msg0000002948]
>> 
>> [zk: localhost:2181(CONNECTED) 7] delete
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> Node does not exist:
>> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
>> 
>> When I performed the same operations on another node, none of those nodes
>> existed.
>> 
>> 
>> Dr Hao He
>> 
>> XPE - the truly SOA platform
>> 
>> he@softtouchit.com
>> http://softtouchit.com
>> http://itunes.com/apps/Scanmobile
>> 
>> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
>> 
>>> Can you provide some more information?  The output of some of the four
>>> letter commands and a transcript of what you are doing would be very
>>> helpful.
>>> 
>>> Also, there is no way for znodes to exist on one node of a properly
>>> operating ZK cluster and not on either of the other two.  Something has to
>>> be wrong and I would vote for operator error (not to cast aspersions, it is
>>> just that humans like you and *me* make more errors than ZK does).
>>> 
>>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
>>> 
>>>> hi, All,
>>>> 
>>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>>>> However, when I tried to "delete" any of them, I got "Node does not exist"
>>>> error.    Those nodes do not exist on the other two hosts.
>>>> 
>>>> Any idea how we should handle this type of errors and what might have
>>>> caused this problem?
>>>> 
>>>> Dr Hao He
>>>> 
>>>> XPE - the truly SOA platform
>>>> 
>>>> he@softtouchit.com
>>>> http://softtouchit.com
>>>> http://itunes.com/apps/Scanmobile
>>>> 
>>>> 
>> 
>> 
> 
>

Re: How to handle "Node does not exist" error?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

HI Dr Hao,
  Can you please post the configuration of all the 3 zookeeper servers? I
suspect it might be misconfigured clusters and they might not belong to the
same ensemble.

Just to be clear:
/xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807

And other such nodes exist on one of the zookeeper servers and the same node
does not exist on other servers?

Also, as ted pointed out, can you please post the output of echo ³stat² | nc
localhost 2181 (on all the 3 servers) to the list?

Thanks
mahadev



On 8/11/10 12:10 AM, "Dr Hao He" <he...@softtouchit.com> wrote:

> hi, Ted,
> 
> Thanks for the reply.  Here is what I did:
> 
> [zk: localhost:2181(CONNECTED) 0] ls
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> []
> zk: localhost:2181(CONNECTED) 1] ls
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704,
> msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508,
> msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604,
> msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814,
> msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772,
> msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610,
> msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973,
> msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510,
> msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104,
> msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822,
> msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110,
> msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907,
> msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958,
> msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608,
> msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888,
> msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330,
> msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491,
> msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892,
> msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733,
> msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332,
> msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720,
> msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350,
> msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623,
> msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738,
> msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361,
> msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358,
> msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354,
> msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576,
> msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901,
> msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368,
> msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481,
> msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599,
> msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583,
> msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278,
> msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182,
> msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186,
> msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661,
> msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766,
> msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596,
> msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191,
> msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655,
> msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796,
> msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061,
> msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444,
> msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501,
> msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260,
> msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590,
> msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559,
> msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937,
> msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937,
> msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524,
> msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138,
> msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010,
> msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147,
> msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690,
> msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812,
> msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941,
> msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540,
> msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584,
> msg0000002948]
> 
> [zk: localhost:2181(CONNECTED) 7] delete
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> Node does not exist:
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> 
> When I performed the same operations on another node, none of those nodes
> existed.
> 
> 
> Dr Hao He
> 
> XPE - the truly SOA platform
> 
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
> 
> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> 
>> Can you provide some more information?  The output of some of the four
>> letter commands and a transcript of what you are doing would be very
>> helpful.
>> 
>> Also, there is no way for znodes to exist on one node of a properly
>> operating ZK cluster and not on either of the other two.  Something has to
>> be wrong and I would vote for operator error (not to cast aspersions, it is
>> just that humans like you and *me* make more errors than ZK does).
>> 
>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
>> 
>>> hi, All,
>>> 
>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>>> However, when I tried to "delete" any of them, I got "Node does not exist"
>>> error.    Those nodes do not exist on the other two hosts.
>>> 
>>> Any idea how we should handle this type of errors and what might have
>>> caused this problem?
>>> 
>>> Dr Hao He
>>> 
>>> XPE - the truly SOA platform
>>> 
>>> he@softtouchit.com
>>> http://softtouchit.com
>>> http://itunes.com/apps/Scanmobile
>>> 
>>> 
> 
>

Re: How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.

hi, Ted,

Thanks for the reply.  Here is what I did:

[zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
[]
zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs              
[msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704, msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508, msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604, msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814, msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772, msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610, msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973, msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510, msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104, msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822, msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110, msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907, msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958, msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608, msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888, msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330, msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491, msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892, msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733, msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332, msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720, msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350, msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623, msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738, msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361, msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358, msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354, msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576, msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901, msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368, msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481, msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599, msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583, msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278, msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182, msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186, msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661, msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766, msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596, msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191, msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655, msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796, msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061, msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444, msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501, msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260, msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590, msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559, msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937, msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937, msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524, msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138, msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010, msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147, msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690, msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812, msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941, msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540, msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584, msg0000002948]

[zk: localhost:2181(CONNECTED) 7] delete /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
Node does not exist: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948

When I performed the same operations on another node, none of those nodes existed. 


Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

On 11/08/2010, at 4:38 PM, Ted Dunning wrote:

> Can you provide some more information?  The output of some of the four
> letter commands and a transcript of what you are doing would be very
> helpful.
> 
> Also, there is no way for znodes to exist on one node of a properly
> operating ZK cluster and not on either of the other two.  Something has to
> be wrong and I would vote for operator error (not to cast aspersions, it is
> just that humans like you and *me* make more errors than ZK does).
> 
> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:
> 
>> hi, All,
>> 
>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
>> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>> However, when I tried to "delete" any of them, I got "Node does not exist"
>> error.    Those nodes do not exist on the other two hosts.
>> 
>> Any idea how we should handle this type of errors and what might have
>> caused this problem?
>> 
>> Dr Hao He
>> 
>> XPE - the truly SOA platform
>> 
>> he@softtouchit.com
>> http://softtouchit.com
>> http://itunes.com/apps/Scanmobile
>> 
>>

Re: How to handle "Node does not exist" error?

Posted by Ted Dunning <te...@gmail.com>.

Can you provide some more information?  The output of some of the four
letter commands and a transcript of what you are doing would be very
helpful.

Also, there is no way for znodes to exist on one node of a properly
operating ZK cluster and not on either of the other two.  Something has to
be wrong and I would vote for operator error (not to cast aspersions, it is
just that humans like you and *me* make more errors than ZK does).

On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he...@softtouchit.com> wrote:

> hi, All,
>
> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts,
> there are a number of nodes that I can "get" and "ls" using zkCli.sh .
>  However, when I tried to "delete" any of them, I got "Node does not exist"
> error.    Those nodes do not exist on the other two hosts.
>
> Any idea how we should handle this type of errors and what might have
> caused this problem?
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
> http://itunes.com/apps/Scanmobile
>
>

How to handle "Node does not exist" error?

Posted by Dr Hao He <he...@softtouchit.com>.

hi, All,

I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the hosts, there are a number of nodes that I can "get" and "ls" using zkCli.sh .  However, when I tried to "delete" any of them, I got "Node does not exist" error.    Those nodes do not exist on the other two hosts. 

Any idea how we should handle this type of errors and what might have caused this problem?

Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com
http://itunes.com/apps/Scanmobile

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

Can you enable verboseGC and look at the tenuring distribution and times for
GC?



On Tue, Sep 1, 2009 at 5:54 PM, Satish Bhatti <ct...@gmail.com> wrote:

> Parallel/Serial.
> infact@domU-12-31-39-06-3D-D1:/opt/ir/agent/infact-installs/aaa/infact$
> iostat
> Linux 2.6.18-xenU-ec2-v1.0 (domU-12-31-39-06-3D-D1)     09/01/2009
>  _x86_64_
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          66.11    0.00    1.54    2.96   20.30    9.08
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda2            460.83       410.02     12458.18   40499322 1230554928
> sdc               0.00         0.00         0.00         96          0
> sda1              0.53         5.01         4.89     495338     482592
>
>
>
> On Tue, Sep 1, 2009 at 5:46 PM, Mahadev Konar <ma...@yahoo-inc.com>
> wrote:
>
> > Hi satish,
> >  what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
> >
> >  Also, how is your disk usage on this machine? Can you check your iostat
> > numbers?
> >
> > Thanks
> > mahadev
> >
> >
> > On 9/1/09 5:15 PM, "Satish Bhatti" <ct...@gmail.com> wrote:
> >
> > > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on
> PS
> > > scavenge( 7,636 collections)
> > >
> > > It's been running for about 48 hours.
> > >
> > >
> > > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > >> Do you have long GC delays?
> > >>
> > >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <ct...@gmail.com>
> > wrote:
> > >>
> > >>> Session timeout is 30 seconds.
> > >>>
> > >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org>
> wrote:
> > >>>
> > >>>> What is your client timeout? It may be too low.
> > >>>>
> > >>>> also see this section on handling recoverable errors:
> > >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> > >>>>
> > >>>> connection loss in particular needs special care since:
> > >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
> > >> there
> > >>>> may be some requests in flight; we don't know where they were in
> their
> > >>>> flight at the time of the connection loss. "
> > >>>>
> > >>>> Patrick
> > >>>>
> > >>>>
> > >>>> Satish Bhatti wrote:
> > >>>>
> > >>>>> I have recently started running on EC2 and am seeing quite a few
> > >>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
> > >>  Since
> > >>> I
> > >>>>> assume that eventually, if the shit truly hits the fan, I will get
> a
> > >>>>> SessionExpired?
> > >>>>> Satish
> > >>>>>
> > >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>  We have used EC2 quite a bit for ZK.
> > >>>>>>
> > >>>>>> The basic lessons that I have learned include:
> > >>>>>>
> > >>>>>> a) EC2's biggest advantage after scaling and elasticity was
> > >> conformity
> > >>> of
> > >>>>>> configuration.  Since you are bringing machines up and down all
> the
> > >>> time,
> > >>>>>> they begin to act more like programs and you wind up with boot
> > >> scripts
> > >>>>>> that
> > >>>>>> give you a very predictable environment.  Nice.
> > >>>>>>
> > >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated
> > VLAN.
> > >>>>>>  That
> > >>>>>> can make the ZK servers appear a bit less connected.  You have to
> > >> plan
> > >>>>>> for
> > >>>>>> ConnectionLoss events.
> > >>>>>>
> > >>>>>> c) for highest reliability, I switched to large instances.  On
> > >>>>>> reflection,
> > >>>>>> I
> > >>>>>> think that was helpful, but less important than I thought at the
> > >> time.
> > >>>>>>
> > >>>>>> d) increasing and decreasing cluster size is nearly painless and
> is
> > >>>>>> easily
> > >>>>>> scriptable.  To decrease, do a rolling update on the survivors to
> > >>> update
> > >>>>>> their configuration.  Then take down the instance you want to
> lose.
> > >>  To
> > >>>>>> increase, do a rolling update starting with the new instances to
> > >> update
> > >>>>>> the
> > >>>>>> configuration to include all of the machines.  The rolling update
> > >>> should
> > >>>>>> bounce each ZK with several seconds between each bounce.
>  Rescaling
> > >> the
> > >>>>>> cluster takes less than a minute which makes it comparable to EC2
> > >>>>>> instance
> > >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that
> we
> > >>> used
> > >>>>>> plus about 20 seconds for additional configuration).
> > >>>>>>
> > >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com
> >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>  Hello
> > >>>>>>>
> > >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In
> my
> > >>>>>>>
> > >>>>>> system,
> > >>>>>>
> > >>>>>>> zookeeper is used to run a locking service and to generate unique
> > >>> id's.
> > >>>>>>> Currently, for testing purposes, I am only running one instance.
> > >> Now,
> > >>> I
> > >>>>>>>
> > >>>>>> need
> > >>>>>>
> > >>>>>>> to set up an ensemble to protect my system against crashes.
> > >>>>>>> The ec2 services has some differences to a normal server farm.
> E.g.
> > >>> the
> > >>>>>>> data saved on the file system of an ec2 instance is lost if the
> > >>> instance
> > >>>>>>> crashes. In the documentation of zookeeper, I have read that
> > >> zookeeper
> > >>>>>>>
> > >>>>>> saves
> > >>>>>>
> > >>>>>>> snapshots of the in-memory data in the file system. Is that
> needed
> > >> for
> > >>>>>>> recovery? Logically, it would be much easier for me if this is
> not
> > >> the
> > >>>>>>>
> > >>>>>> case.
> > >>>>>>
> > >>>>>>> Additionally, ec2 brings the advantage that serves can be switch
> on
> > >>> and
> > >>>>>>>
> > >>>>>> off
> > >>>>>>
> > >>>>>>> dynamically dependent on the load, traffic, etc. Can this
> advantage
> > >> be
> > >>>>>>> utilized for a zookeeper ensemble? Is it possible to add a
> > zookeeper
> > >>>>>>>
> > >>>>>> server
> > >>>>>>
> > >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> > >>>>>>>
> > >>>>>>> David
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Ted Dunning, CTO
> > >> DeepDyve
> > >>
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

Parallel/Serial.
infact@domU-12-31-39-06-3D-D1:/opt/ir/agent/infact-installs/aaa/infact$
iostat
Linux 2.6.18-xenU-ec2-v1.0 (domU-12-31-39-06-3D-D1)     09/01/2009
 _x86_64_

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          66.11    0.00    1.54    2.96   20.30    9.08

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            460.83       410.02     12458.18   40499322 1230554928
sdc               0.00         0.00         0.00         96          0
sda1              0.53         5.01         4.89     495338     482592



On Tue, Sep 1, 2009 at 5:46 PM, Mahadev Konar <ma...@yahoo-inc.com> wrote:

> Hi satish,
>  what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
>
>  Also, how is your disk usage on this machine? Can you check your iostat
> numbers?
>
> Thanks
> mahadev
>
>
> On 9/1/09 5:15 PM, "Satish Bhatti" <ct...@gmail.com> wrote:
>
> > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
> > scavenge( 7,636 collections)
> >
> > It's been running for about 48 hours.
> >
> >
> > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> Do you have long GC delays?
> >>
> >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <ct...@gmail.com>
> wrote:
> >>
> >>> Session timeout is 30 seconds.
> >>>
> >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
> >>>
> >>>> What is your client timeout? It may be too low.
> >>>>
> >>>> also see this section on handling recoverable errors:
> >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> >>>>
> >>>> connection loss in particular needs special care since:
> >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
> >> there
> >>>> may be some requests in flight; we don't know where they were in their
> >>>> flight at the time of the connection loss. "
> >>>>
> >>>> Patrick
> >>>>
> >>>>
> >>>> Satish Bhatti wrote:
> >>>>
> >>>>> I have recently started running on EC2 and am seeing quite a few
> >>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
> >>  Since
> >>> I
> >>>>> assume that eventually, if the shit truly hits the fan, I will get a
> >>>>> SessionExpired?
> >>>>> Satish
> >>>>>
> >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>  We have used EC2 quite a bit for ZK.
> >>>>>>
> >>>>>> The basic lessons that I have learned include:
> >>>>>>
> >>>>>> a) EC2's biggest advantage after scaling and elasticity was
> >> conformity
> >>> of
> >>>>>> configuration.  Since you are bringing machines up and down all the
> >>> time,
> >>>>>> they begin to act more like programs and you wind up with boot
> >> scripts
> >>>>>> that
> >>>>>> give you a very predictable environment.  Nice.
> >>>>>>
> >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated
> VLAN.
> >>>>>>  That
> >>>>>> can make the ZK servers appear a bit less connected.  You have to
> >> plan
> >>>>>> for
> >>>>>> ConnectionLoss events.
> >>>>>>
> >>>>>> c) for highest reliability, I switched to large instances.  On
> >>>>>> reflection,
> >>>>>> I
> >>>>>> think that was helpful, but less important than I thought at the
> >> time.
> >>>>>>
> >>>>>> d) increasing and decreasing cluster size is nearly painless and is
> >>>>>> easily
> >>>>>> scriptable.  To decrease, do a rolling update on the survivors to
> >>> update
> >>>>>> their configuration.  Then take down the instance you want to lose.
> >>  To
> >>>>>> increase, do a rolling update starting with the new instances to
> >> update
> >>>>>> the
> >>>>>> configuration to include all of the machines.  The rolling update
> >>> should
> >>>>>> bounce each ZK with several seconds between each bounce.  Rescaling
> >> the
> >>>>>> cluster takes less than a minute which makes it comparable to EC2
> >>>>>> instance
> >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
> >>> used
> >>>>>> plus about 20 seconds for additional configuration).
> >>>>>>
> >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hello
> >>>>>>>
> >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> >>>>>>>
> >>>>>> system,
> >>>>>>
> >>>>>>> zookeeper is used to run a locking service and to generate unique
> >>> id's.
> >>>>>>> Currently, for testing purposes, I am only running one instance.
> >> Now,
> >>> I
> >>>>>>>
> >>>>>> need
> >>>>>>
> >>>>>>> to set up an ensemble to protect my system against crashes.
> >>>>>>> The ec2 services has some differences to a normal server farm. E.g.
> >>> the
> >>>>>>> data saved on the file system of an ec2 instance is lost if the
> >>> instance
> >>>>>>> crashes. In the documentation of zookeeper, I have read that
> >> zookeeper
> >>>>>>>
> >>>>>> saves
> >>>>>>
> >>>>>>> snapshots of the in-memory data in the file system. Is that needed
> >> for
> >>>>>>> recovery? Logically, it would be much easier for me if this is not
> >> the
> >>>>>>>
> >>>>>> case.
> >>>>>>
> >>>>>>> Additionally, ec2 brings the advantage that serves can be switch on
> >>> and
> >>>>>>>
> >>>>>> off
> >>>>>>
> >>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage
> >> be
> >>>>>>> utilized for a zookeeper ensemble? Is it possible to add a
> zookeeper
> >>>>>>>
> >>>>>> server
> >>>>>>
> >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> >>>>>>>
> >>>>>>> David
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Ted Dunning, CTO
> >> DeepDyve
> >>
>
>

Re: zookeeper on ec2

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi satish,
 what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?

  Also, how is your disk usage on this machine? Can you check your iostat
numbers? 

Thanks
mahadev


On 9/1/09 5:15 PM, "Satish Bhatti" <ct...@gmail.com> wrote:

> GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
> scavenge( 7,636 collections)
> 
> It's been running for about 48 hours.
> 
> 
> On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Do you have long GC delays?
>> 
>> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <ct...@gmail.com> wrote:
>> 
>>> Session timeout is 30 seconds.
>>> 
>>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
>>> 
>>>> What is your client timeout? It may be too low.
>>>> 
>>>> also see this section on handling recoverable errors:
>>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>> 
>>>> connection loss in particular needs special care since:
>>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
>> there
>>>> may be some requests in flight; we don't know where they were in their
>>>> flight at the time of the connection loss. "
>>>> 
>>>> Patrick
>>>> 
>>>> 
>>>> Satish Bhatti wrote:
>>>> 
>>>>> I have recently started running on EC2 and am seeing quite a few
>>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
>>  Since
>>> I
>>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>>> SessionExpired?
>>>>> Satish
>>>>> 
>>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>  We have used EC2 quite a bit for ZK.
>>>>>> 
>>>>>> The basic lessons that I have learned include:
>>>>>> 
>>>>>> a) EC2's biggest advantage after scaling and elasticity was
>> conformity
>>> of
>>>>>> configuration.  Since you are bringing machines up and down all the
>>> time,
>>>>>> they begin to act more like programs and you wind up with boot
>> scripts
>>>>>> that
>>>>>> give you a very predictable environment.  Nice.
>>>>>> 
>>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>>  That
>>>>>> can make the ZK servers appear a bit less connected.  You have to
>> plan
>>>>>> for
>>>>>> ConnectionLoss events.
>>>>>> 
>>>>>> c) for highest reliability, I switched to large instances.  On
>>>>>> reflection,
>>>>>> I
>>>>>> think that was helpful, but less important than I thought at the
>> time.
>>>>>> 
>>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>>> easily
>>>>>> scriptable.  To decrease, do a rolling update on the survivors to
>>> update
>>>>>> their configuration.  Then take down the instance you want to lose.
>>  To
>>>>>> increase, do a rolling update starting with the new instances to
>> update
>>>>>> the
>>>>>> configuration to include all of the machines.  The rolling update
>>> should
>>>>>> bounce each ZK with several seconds between each bounce.  Rescaling
>> the
>>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>>> instance
>>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
>>> used
>>>>>> plus about 20 seconds for additional configuration).
>>>>>> 
>>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>>>>> wrote:
>>>>>> 
>>>>>>  Hello
>>>>>>> 
>>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>> 
>>>>>> system,
>>>>>> 
>>>>>>> zookeeper is used to run a locking service and to generate unique
>>> id's.
>>>>>>> Currently, for testing purposes, I am only running one instance.
>> Now,
>>> I
>>>>>>> 
>>>>>> need
>>>>>> 
>>>>>>> to set up an ensemble to protect my system against crashes.
>>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>> the
>>>>>>> data saved on the file system of an ec2 instance is lost if the
>>> instance
>>>>>>> crashes. In the documentation of zookeeper, I have read that
>> zookeeper
>>>>>>> 
>>>>>> saves
>>>>>> 
>>>>>>> snapshots of the in-memory data in the file system. Is that needed
>> for
>>>>>>> recovery? Logically, it would be much easier for me if this is not
>> the
>>>>>>> 
>>>>>> case.
>>>>>> 
>>>>>>> Additionally, ec2 brings the advantage that serves can be switch on
>>> and
>>>>>>> 
>>>>>> off
>>>>>> 
>>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage
>> be
>>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>> 
>>>>>> server
>>>>>> 
>>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>> 
>>>>>>> David
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
scavenge( 7,636 collections)

It's been running for about 48 hours.


On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <te...@gmail.com> wrote:

> Do you have long GC delays?
>
> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <ct...@gmail.com> wrote:
>
> > Session timeout is 30 seconds.
> >
> > On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
> >
> > > What is your client timeout? It may be too low.
> > >
> > > also see this section on handling recoverable errors:
> > > http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> > >
> > > connection loss in particular needs special care since:
> > > "When a ZooKeeper client loses a connection to the ZooKeeper server
> there
> > > may be some requests in flight; we don't know where they were in their
> > > flight at the time of the connection loss. "
> > >
> > > Patrick
> > >
> > >
> > > Satish Bhatti wrote:
> > >
> > >> I have recently started running on EC2 and am seeing quite a few
> > >> ConnectionLoss exceptions.  Should I just catch these and retry?
>  Since
> > I
> > >> assume that eventually, if the shit truly hits the fan, I will get a
> > >> SessionExpired?
> > >> Satish
> > >>
> > >> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
> > >> wrote:
> > >>
> > >>  We have used EC2 quite a bit for ZK.
> > >>>
> > >>> The basic lessons that I have learned include:
> > >>>
> > >>> a) EC2's biggest advantage after scaling and elasticity was
> conformity
> > of
> > >>> configuration.  Since you are bringing machines up and down all the
> > time,
> > >>> they begin to act more like programs and you wind up with boot
> scripts
> > >>> that
> > >>> give you a very predictable environment.  Nice.
> > >>>
> > >>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
> > >>>  That
> > >>> can make the ZK servers appear a bit less connected.  You have to
> plan
> > >>> for
> > >>> ConnectionLoss events.
> > >>>
> > >>> c) for highest reliability, I switched to large instances.  On
> > >>> reflection,
> > >>> I
> > >>> think that was helpful, but less important than I thought at the
> time.
> > >>>
> > >>> d) increasing and decreasing cluster size is nearly painless and is
> > >>> easily
> > >>> scriptable.  To decrease, do a rolling update on the survivors to
> > update
> > >>> their configuration.  Then take down the instance you want to lose.
>  To
> > >>> increase, do a rolling update starting with the new instances to
> update
> > >>> the
> > >>> configuration to include all of the machines.  The rolling update
> > should
> > >>> bounce each ZK with several seconds between each bounce.  Rescaling
> the
> > >>> cluster takes less than a minute which makes it comparable to EC2
> > >>> instance
> > >>> boot time (about 30 seconds for the Alestic ubuntu instance that we
> > used
> > >>> plus about 20 seconds for additional configuration).
> > >>>
> > >>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
> > >>> wrote:
> > >>>
> > >>>  Hello
> > >>>>
> > >>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> > >>>>
> > >>> system,
> > >>>
> > >>>> zookeeper is used to run a locking service and to generate unique
> > id's.
> > >>>> Currently, for testing purposes, I am only running one instance.
> Now,
> > I
> > >>>>
> > >>> need
> > >>>
> > >>>> to set up an ensemble to protect my system against crashes.
> > >>>> The ec2 services has some differences to a normal server farm. E.g.
> > the
> > >>>> data saved on the file system of an ec2 instance is lost if the
> > instance
> > >>>> crashes. In the documentation of zookeeper, I have read that
> zookeeper
> > >>>>
> > >>> saves
> > >>>
> > >>>> snapshots of the in-memory data in the file system. Is that needed
> for
> > >>>> recovery? Logically, it would be much easier for me if this is not
> the
> > >>>>
> > >>> case.
> > >>>
> > >>>> Additionally, ec2 brings the advantage that serves can be switch on
> > and
> > >>>>
> > >>> off
> > >>>
> > >>>> dynamically dependent on the load, traffic, etc. Can this advantage
> be
> > >>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
> > >>>>
> > >>> server
> > >>>
> > >>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> > >>>>
> > >>>> David
> > >>>>
> > >>>>
> > >>
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

Do you have long GC delays?

On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <ct...@gmail.com> wrote:

> Session timeout is 30 seconds.
>
> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
>
> > What is your client timeout? It may be too low.
> >
> > also see this section on handling recoverable errors:
> > http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> >
> > connection loss in particular needs special care since:
> > "When a ZooKeeper client loses a connection to the ZooKeeper server there
> > may be some requests in flight; we don't know where they were in their
> > flight at the time of the connection loss. "
> >
> > Patrick
> >
> >
> > Satish Bhatti wrote:
> >
> >> I have recently started running on EC2 and am seeing quite a few
> >> ConnectionLoss exceptions.  Should I just catch these and retry?  Since
> I
> >> assume that eventually, if the shit truly hits the fan, I will get a
> >> SessionExpired?
> >> Satish
> >>
> >> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >>  We have used EC2 quite a bit for ZK.
> >>>
> >>> The basic lessons that I have learned include:
> >>>
> >>> a) EC2's biggest advantage after scaling and elasticity was conformity
> of
> >>> configuration.  Since you are bringing machines up and down all the
> time,
> >>> they begin to act more like programs and you wind up with boot scripts
> >>> that
> >>> give you a very predictable environment.  Nice.
> >>>
> >>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
> >>>  That
> >>> can make the ZK servers appear a bit less connected.  You have to plan
> >>> for
> >>> ConnectionLoss events.
> >>>
> >>> c) for highest reliability, I switched to large instances.  On
> >>> reflection,
> >>> I
> >>> think that was helpful, but less important than I thought at the time.
> >>>
> >>> d) increasing and decreasing cluster size is nearly painless and is
> >>> easily
> >>> scriptable.  To decrease, do a rolling update on the survivors to
> update
> >>> their configuration.  Then take down the instance you want to lose.  To
> >>> increase, do a rolling update starting with the new instances to update
> >>> the
> >>> configuration to include all of the machines.  The rolling update
> should
> >>> bounce each ZK with several seconds between each bounce.  Rescaling the
> >>> cluster takes less than a minute which makes it comparable to EC2
> >>> instance
> >>> boot time (about 30 seconds for the Alestic ubuntu instance that we
> used
> >>> plus about 20 seconds for additional configuration).
> >>>
> >>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
> >>> wrote:
> >>>
> >>>  Hello
> >>>>
> >>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> >>>>
> >>> system,
> >>>
> >>>> zookeeper is used to run a locking service and to generate unique
> id's.
> >>>> Currently, for testing purposes, I am only running one instance. Now,
> I
> >>>>
> >>> need
> >>>
> >>>> to set up an ensemble to protect my system against crashes.
> >>>> The ec2 services has some differences to a normal server farm. E.g.
> the
> >>>> data saved on the file system of an ec2 instance is lost if the
> instance
> >>>> crashes. In the documentation of zookeeper, I have read that zookeeper
> >>>>
> >>> saves
> >>>
> >>>> snapshots of the in-memory data in the file system. Is that needed for
> >>>> recovery? Logically, it would be much easier for me if this is not the
> >>>>
> >>> case.
> >>>
> >>>> Additionally, ec2 brings the advantage that serves can be switch on
> and
> >>>>
> >>> off
> >>>
> >>>> dynamically dependent on the load, traffic, etc. Can this advantage be
> >>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
> >>>>
> >>> server
> >>>
> >>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> >>>>
> >>>> David
> >>>>
> >>>>
> >>
>



-- 
Ted Dunning, CTO
DeepDyve

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

Depends on what your tests are. Are they pretty simple/light? then 
probably network issue. Heavy load testing? then might be the 
server/client, might be the network.

easiest thing is to run a ping test while running your zk test and see 
if pings are getting through (and latency). You should also review your 
client/server logs for any information during the CLoss.

Ted Dunning would be a good resource - he runs ZK inside ec2 and has 
alot of experience with it.

Patrick

Satish Bhatti wrote:
> For my initial testing I am running with a single ZooKeeper server, i.e. the
> ensemble only has one server.  Not sure if this is exacerbating the problem?
>  I will check out the trouble shooting link you sent me.
> 
> On Tue, Sep 1, 2009 at 5:01 PM, Patrick Hunt <ph...@apache.org> wrote:
> 
>> I'm not very familiar with ec2 environment, are you doing any monitoring?
>> In particular network connectivity btw nodes? Sounds like networking issues
>> btw nodes (I'm assuming you've also looked at stuff like this
>> http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting and verified that
>> you are not swapping (see gc pressure), etc...)
>>
>> Patrick
>>
>>
>> Satish Bhatti wrote:
>>
>>> Session timeout is 30 seconds.
>>>
>>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
>>>
>>>  What is your client timeout? It may be too low.
>>>> also see this section on handling recoverable errors:
>>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>>
>>>> connection loss in particular needs special care since:
>>>> "When a ZooKeeper client loses a connection to the ZooKeeper server there
>>>> may be some requests in flight; we don't know where they were in their
>>>> flight at the time of the connection loss. "
>>>>
>>>> Patrick
>>>>
>>>>
>>>> Satish Bhatti wrote:
>>>>
>>>>  I have recently started running on EC2 and am seeing quite a few
>>>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since
>>>>> I
>>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>>> SessionExpired?
>>>>> Satish
>>>>>
>>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  We have used EC2 quite a bit for ZK.
>>>>>
>>>>>> The basic lessons that I have learned include:
>>>>>>
>>>>>> a) EC2's biggest advantage after scaling and elasticity was conformity
>>>>>> of
>>>>>> configuration.  Since you are bringing machines up and down all the
>>>>>> time,
>>>>>> they begin to act more like programs and you wind up with boot scripts
>>>>>> that
>>>>>> give you a very predictable environment.  Nice.
>>>>>>
>>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>>  That
>>>>>> can make the ZK servers appear a bit less connected.  You have to plan
>>>>>> for
>>>>>> ConnectionLoss events.
>>>>>>
>>>>>> c) for highest reliability, I switched to large instances.  On
>>>>>> reflection,
>>>>>> I
>>>>>> think that was helpful, but less important than I thought at the time.
>>>>>>
>>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>>> easily
>>>>>> scriptable.  To decrease, do a rolling update on the survivors to
>>>>>> update
>>>>>> their configuration.  Then take down the instance you want to lose.  To
>>>>>> increase, do a rolling update starting with the new instances to update
>>>>>> the
>>>>>> configuration to include all of the machines.  The rolling update
>>>>>> should
>>>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>>> instance
>>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
>>>>>> used
>>>>>> plus about 20 seconds for additional configuration).
>>>>>>
>>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Hello
>>>>>>
>>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>>
>>>>>>>  system,
>>>>>>  zookeeper is used to run a locking service and to generate unique
>>>>>>> id's.
>>>>>>> Currently, for testing purposes, I am only running one instance. Now,
>>>>>>> I
>>>>>>>
>>>>>>>  need
>>>>>>  to set up an ensemble to protect my system against crashes.
>>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>>>>>> the
>>>>>>> data saved on the file system of an ec2 instance is lost if the
>>>>>>> instance
>>>>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>>>>
>>>>>>>  saves
>>>>>>  snapshots of the in-memory data in the file system. Is that needed for
>>>>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>>>>
>>>>>>>  case.
>>>>>>  Additionally, ec2 brings the advantage that serves can be switch on
>>>>>>> and
>>>>>>>
>>>>>>>  off
>>>>>>  dynamically dependent on the load, traffic, etc. Can this advantage be
>>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>>
>>>>>>>  server
>>>>>>  dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

For my initial testing I am running with a single ZooKeeper server, i.e. the
ensemble only has one server.  Not sure if this is exacerbating the problem?
 I will check out the trouble shooting link you sent me.

On Tue, Sep 1, 2009 at 5:01 PM, Patrick Hunt <ph...@apache.org> wrote:

> I'm not very familiar with ec2 environment, are you doing any monitoring?
> In particular network connectivity btw nodes? Sounds like networking issues
> btw nodes (I'm assuming you've also looked at stuff like this
> http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting and verified that
> you are not swapping (see gc pressure), etc...)
>
> Patrick
>
>
> Satish Bhatti wrote:
>
>> Session timeout is 30 seconds.
>>
>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>>  What is your client timeout? It may be too low.
>>>
>>> also see this section on handling recoverable errors:
>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>
>>> connection loss in particular needs special care since:
>>> "When a ZooKeeper client loses a connection to the ZooKeeper server there
>>> may be some requests in flight; we don't know where they were in their
>>> flight at the time of the connection loss. "
>>>
>>> Patrick
>>>
>>>
>>> Satish Bhatti wrote:
>>>
>>>  I have recently started running on EC2 and am seeing quite a few
>>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since
>>>> I
>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>> SessionExpired?
>>>> Satish
>>>>
>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>>>> wrote:
>>>>
>>>>  We have used EC2 quite a bit for ZK.
>>>>
>>>>> The basic lessons that I have learned include:
>>>>>
>>>>> a) EC2's biggest advantage after scaling and elasticity was conformity
>>>>> of
>>>>> configuration.  Since you are bringing machines up and down all the
>>>>> time,
>>>>> they begin to act more like programs and you wind up with boot scripts
>>>>> that
>>>>> give you a very predictable environment.  Nice.
>>>>>
>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>  That
>>>>> can make the ZK servers appear a bit less connected.  You have to plan
>>>>> for
>>>>> ConnectionLoss events.
>>>>>
>>>>> c) for highest reliability, I switched to large instances.  On
>>>>> reflection,
>>>>> I
>>>>> think that was helpful, but less important than I thought at the time.
>>>>>
>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>> easily
>>>>> scriptable.  To decrease, do a rolling update on the survivors to
>>>>> update
>>>>> their configuration.  Then take down the instance you want to lose.  To
>>>>> increase, do a rolling update starting with the new instances to update
>>>>> the
>>>>> configuration to include all of the machines.  The rolling update
>>>>> should
>>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>> instance
>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
>>>>> used
>>>>> plus about 20 seconds for additional configuration).
>>>>>
>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>>>> wrote:
>>>>>
>>>>>  Hello
>>>>>
>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>
>>>>>>  system,
>>>>>
>>>>>  zookeeper is used to run a locking service and to generate unique
>>>>>> id's.
>>>>>> Currently, for testing purposes, I am only running one instance. Now,
>>>>>> I
>>>>>>
>>>>>>  need
>>>>>
>>>>>  to set up an ensemble to protect my system against crashes.
>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>>>>> the
>>>>>> data saved on the file system of an ec2 instance is lost if the
>>>>>> instance
>>>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>>>
>>>>>>  saves
>>>>>
>>>>>  snapshots of the in-memory data in the file system. Is that needed for
>>>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>>>
>>>>>>  case.
>>>>>
>>>>>  Additionally, ec2 brings the advantage that serves can be switch on
>>>>>> and
>>>>>>
>>>>>>  off
>>>>>
>>>>>  dynamically dependent on the load, traffic, etc. Can this advantage be
>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>
>>>>>>  server
>>>>>
>>>>>  dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

I'm not very familiar with ec2 environment, are you doing any 
monitoring? In particular network connectivity btw nodes? Sounds like 
networking issues btw nodes (I'm assuming you've also looked at stuff 
like this http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting and 
verified that you are not swapping (see gc pressure), etc...)

Patrick

Satish Bhatti wrote:
> Session timeout is 30 seconds.
> 
> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
> 
>> What is your client timeout? It may be too low.
>>
>> also see this section on handling recoverable errors:
>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>
>> connection loss in particular needs special care since:
>> "When a ZooKeeper client loses a connection to the ZooKeeper server there
>> may be some requests in flight; we don't know where they were in their
>> flight at the time of the connection loss. "
>>
>> Patrick
>>
>>
>> Satish Bhatti wrote:
>>
>>> I have recently started running on EC2 and am seeing quite a few
>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
>>> assume that eventually, if the shit truly hits the fan, I will get a
>>> SessionExpired?
>>> Satish
>>>
>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>
>>>  We have used EC2 quite a bit for ZK.
>>>> The basic lessons that I have learned include:
>>>>
>>>> a) EC2's biggest advantage after scaling and elasticity was conformity of
>>>> configuration.  Since you are bringing machines up and down all the time,
>>>> they begin to act more like programs and you wind up with boot scripts
>>>> that
>>>> give you a very predictable environment.  Nice.
>>>>
>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>  That
>>>> can make the ZK servers appear a bit less connected.  You have to plan
>>>> for
>>>> ConnectionLoss events.
>>>>
>>>> c) for highest reliability, I switched to large instances.  On
>>>> reflection,
>>>> I
>>>> think that was helpful, but less important than I thought at the time.
>>>>
>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>> easily
>>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>>> their configuration.  Then take down the instance you want to lose.  To
>>>> increase, do a rolling update starting with the new instances to update
>>>> the
>>>> configuration to include all of the machines.  The rolling update should
>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>> cluster takes less than a minute which makes it comparable to EC2
>>>> instance
>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>>> plus about 20 seconds for additional configuration).
>>>>
>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>>> wrote:
>>>>
>>>>  Hello
>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>
>>>> system,
>>>>
>>>>> zookeeper is used to run a locking service and to generate unique id's.
>>>>> Currently, for testing purposes, I am only running one instance. Now, I
>>>>>
>>>> need
>>>>
>>>>> to set up an ensemble to protect my system against crashes.
>>>>> The ec2 services has some differences to a normal server farm. E.g. the
>>>>> data saved on the file system of an ec2 instance is lost if the instance
>>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>>
>>>> saves
>>>>
>>>>> snapshots of the in-memory data in the file system. Is that needed for
>>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>>
>>>> case.
>>>>
>>>>> Additionally, ec2 brings the advantage that serves can be switch on and
>>>>>
>>>> off
>>>>
>>>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>
>>>> server
>>>>
>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>
>>>>> David
>>>>>
>>>>>
>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

Session timeout is 30 seconds.

On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:

> What is your client timeout? It may be too low.
>
> also see this section on handling recoverable errors:
> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>
> connection loss in particular needs special care since:
> "When a ZooKeeper client loses a connection to the ZooKeeper server there
> may be some requests in flight; we don't know where they were in their
> flight at the time of the connection loss. "
>
> Patrick
>
>
> Satish Bhatti wrote:
>
>> I have recently started running on EC2 and am seeing quite a few
>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
>> assume that eventually, if the shit truly hits the fan, I will get a
>> SessionExpired?
>> Satish
>>
>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>  We have used EC2 quite a bit for ZK.
>>>
>>> The basic lessons that I have learned include:
>>>
>>> a) EC2's biggest advantage after scaling and elasticity was conformity of
>>> configuration.  Since you are bringing machines up and down all the time,
>>> they begin to act more like programs and you wind up with boot scripts
>>> that
>>> give you a very predictable environment.  Nice.
>>>
>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>  That
>>> can make the ZK servers appear a bit less connected.  You have to plan
>>> for
>>> ConnectionLoss events.
>>>
>>> c) for highest reliability, I switched to large instances.  On
>>> reflection,
>>> I
>>> think that was helpful, but less important than I thought at the time.
>>>
>>> d) increasing and decreasing cluster size is nearly painless and is
>>> easily
>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>> their configuration.  Then take down the instance you want to lose.  To
>>> increase, do a rolling update starting with the new instances to update
>>> the
>>> configuration to include all of the machines.  The rolling update should
>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>> cluster takes less than a minute which makes it comparable to EC2
>>> instance
>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>> plus about 20 seconds for additional configuration).
>>>
>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com>
>>> wrote:
>>>
>>>  Hello
>>>>
>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>
>>> system,
>>>
>>>> zookeeper is used to run a locking service and to generate unique id's.
>>>> Currently, for testing purposes, I am only running one instance. Now, I
>>>>
>>> need
>>>
>>>> to set up an ensemble to protect my system against crashes.
>>>> The ec2 services has some differences to a normal server farm. E.g. the
>>>> data saved on the file system of an ec2 instance is lost if the instance
>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>
>>> saves
>>>
>>>> snapshots of the in-memory data in the file system. Is that needed for
>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>
>>> case.
>>>
>>>> Additionally, ec2 brings the advantage that serves can be switch on and
>>>>
>>> off
>>>
>>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>
>>> server
>>>
>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>
>>>> David
>>>>
>>>>
>>

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

What is your client timeout? It may be too low.

also see this section on handling recoverable errors:
http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling

connection loss in particular needs special care since:
"When a ZooKeeper client loses a connection to the ZooKeeper server 
there may be some requests in flight; we don't know where they were in 
their flight at the time of the connection loss. "

Patrick

Satish Bhatti wrote:
> I have recently started running on EC2 and am seeing quite a few
> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
> assume that eventually, if the shit truly hits the fan, I will get a
> SessionExpired?
> Satish
> 
> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com> wrote:
> 
>> We have used EC2 quite a bit for ZK.
>>
>> The basic lessons that I have learned include:
>>
>> a) EC2's biggest advantage after scaling and elasticity was conformity of
>> configuration.  Since you are bringing machines up and down all the time,
>> they begin to act more like programs and you wind up with boot scripts that
>> give you a very predictable environment.  Nice.
>>
>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
>> can make the ZK servers appear a bit less connected.  You have to plan for
>> ConnectionLoss events.
>>
>> c) for highest reliability, I switched to large instances.  On reflection,
>> I
>> think that was helpful, but less important than I thought at the time.
>>
>> d) increasing and decreasing cluster size is nearly painless and is easily
>> scriptable.  To decrease, do a rolling update on the survivors to update
>> their configuration.  Then take down the instance you want to lose.  To
>> increase, do a rolling update starting with the new instances to update the
>> configuration to include all of the machines.  The rolling update should
>> bounce each ZK with several seconds between each bounce.  Rescaling the
>> cluster takes less than a minute which makes it comparable to EC2 instance
>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>> plus about 20 seconds for additional configuration).
>>
>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com> wrote:
>>
>>> Hello
>>>
>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>> system,
>>> zookeeper is used to run a locking service and to generate unique id's.
>>> Currently, for testing purposes, I am only running one instance. Now, I
>> need
>>> to set up an ensemble to protect my system against crashes.
>>> The ec2 services has some differences to a normal server farm. E.g. the
>>> data saved on the file system of an ec2 instance is lost if the instance
>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>> saves
>>> snapshots of the in-memory data in the file system. Is that needed for
>>> recovery? Logically, it would be much easier for me if this is not the
>> case.
>>> Additionally, ec2 brings the advantage that serves can be switch on and
>> off
>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>> server
>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>
>>> David
>>>
>

Re: zookeeper on ec2

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi Satish,

  Connectionloss is a little trickier than just retrying blindly. Please
read the following sections on this -

http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling

And the programmers guide:

http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html

To learn more about how to handle CONNECTIONLOSS. The idea is that that
blindly retrying would create problems with CONNECTIONLOSS, since a
CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that
you were executing failed to execute. It might be possible that this
operation went through the servers.

Since, this has been a constant source of confusion for everyone who starts
using zookeeper we are working on a fix ZOOKEEPER-22 which will take care of
this problem and programmers would not have to worry about CONNECTIONLOSS
handling.

Thanks
mahadev




On 9/1/09 4:13 PM, "Satish Bhatti" <ct...@gmail.com> wrote:

> I have recently started running on EC2 and am seeing quite a few
> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
> assume that eventually, if the shit truly hits the fan, I will get a
> SessionExpired?
> Satish
> 
> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com> wrote:
> 
>> We have used EC2 quite a bit for ZK.
>> 
>> The basic lessons that I have learned include:
>> 
>> a) EC2's biggest advantage after scaling and elasticity was conformity of
>> configuration.  Since you are bringing machines up and down all the time,
>> they begin to act more like programs and you wind up with boot scripts that
>> give you a very predictable environment.  Nice.
>> 
>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
>> can make the ZK servers appear a bit less connected.  You have to plan for
>> ConnectionLoss events.
>> 
>> c) for highest reliability, I switched to large instances.  On reflection,
>> I
>> think that was helpful, but less important than I thought at the time.
>> 
>> d) increasing and decreasing cluster size is nearly painless and is easily
>> scriptable.  To decrease, do a rolling update on the survivors to update
>> their configuration.  Then take down the instance you want to lose.  To
>> increase, do a rolling update starting with the new instances to update the
>> configuration to include all of the machines.  The rolling update should
>> bounce each ZK with several seconds between each bounce.  Rescaling the
>> cluster takes less than a minute which makes it comparable to EC2 instance
>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>> plus about 20 seconds for additional configuration).
>> 
>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com> wrote:
>> 
>>> Hello
>>> 
>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>> system,
>>> zookeeper is used to run a locking service and to generate unique id's.
>>> Currently, for testing purposes, I am only running one instance. Now, I
>> need
>>> to set up an ensemble to protect my system against crashes.
>>> The ec2 services has some differences to a normal server farm. E.g. the
>>> data saved on the file system of an ec2 instance is lost if the instance
>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>> saves
>>> snapshots of the in-memory data in the file system. Is that needed for
>>> recovery? Logically, it would be much easier for me if this is not the
>> case.
>>> Additionally, ec2 brings the advantage that serves can be switch on and
>> off
>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>> server
>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>> 
>>> David
>>> 
>>

Re: zookeeper on ec2

Posted by Satish Bhatti <ct...@gmail.com>.

I have recently started running on EC2 and am seeing quite a few
ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
assume that eventually, if the shit truly hits the fan, I will get a
SessionExpired?
Satish

On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <te...@gmail.com> wrote:

> We have used EC2 quite a bit for ZK.
>
> The basic lessons that I have learned include:
>
> a) EC2's biggest advantage after scaling and elasticity was conformity of
> configuration.  Since you are bringing machines up and down all the time,
> they begin to act more like programs and you wind up with boot scripts that
> give you a very predictable environment.  Nice.
>
> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
> can make the ZK servers appear a bit less connected.  You have to plan for
> ConnectionLoss events.
>
> c) for highest reliability, I switched to large instances.  On reflection,
> I
> think that was helpful, but less important than I thought at the time.
>
> d) increasing and decreasing cluster size is nearly painless and is easily
> scriptable.  To decrease, do a rolling update on the survivors to update
> their configuration.  Then take down the instance you want to lose.  To
> increase, do a rolling update starting with the new instances to update the
> configuration to include all of the machines.  The rolling update should
> bounce each ZK with several seconds between each bounce.  Rescaling the
> cluster takes less than a minute which makes it comparable to EC2 instance
> boot time (about 30 seconds for the Alestic ubuntu instance that we used
> plus about 20 seconds for additional configuration).
>
> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com> wrote:
>
> > Hello
> >
> > I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> system,
> > zookeeper is used to run a locking service and to generate unique id's.
> > Currently, for testing purposes, I am only running one instance. Now, I
> need
> > to set up an ensemble to protect my system against crashes.
> > The ec2 services has some differences to a normal server farm. E.g. the
> > data saved on the file system of an ec2 instance is lost if the instance
> > crashes. In the documentation of zookeeper, I have read that zookeeper
> saves
> > snapshots of the in-memory data in the file system. Is that needed for
> > recovery? Logically, it would be much easier for me if this is not the
> case.
> > Additionally, ec2 brings the advantage that serves can be switch on and
> off
> > dynamically dependent on the load, traffic, etc. Can this advantage be
> > utilized for a zookeeper ensemble? Is it possible to add a zookeeper
> server
> > dynamically to an ensemble? E.g. dependent on the in-memory load?
> >
> > David
> >
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

We have used EC2 quite a bit for ZK.

The basic lessons that I have learned include:

a) EC2's biggest advantage after scaling and elasticity was conformity of
configuration.  Since you are bringing machines up and down all the time,
they begin to act more like programs and you wind up with boot scripts that
give you a very predictable environment.  Nice.

b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
can make the ZK servers appear a bit less connected.  You have to plan for
ConnectionLoss events.

c) for highest reliability, I switched to large instances.  On reflection, I
think that was helpful, but less important than I thought at the time.

d) increasing and decreasing cluster size is nearly painless and is easily
scriptable.  To decrease, do a rolling update on the survivors to update
their configuration.  Then take down the instance you want to lose.  To
increase, do a rolling update starting with the new instances to update the
configuration to include all of the machines.  The rolling update should
bounce each ZK with several seconds between each bounce.  Rescaling the
cluster takes less than a minute which makes it comparable to EC2 instance
boot time (about 30 seconds for the Alestic ubuntu instance that we used
plus about 20 seconds for additional configuration).

On Mon, Jul 6, 2009 at 4:45 AM, David Graf <da...@28msec.com> wrote:

> Hello
>
> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my system,
> zookeeper is used to run a locking service and to generate unique id's.
> Currently, for testing purposes, I am only running one instance. Now, I need
> to set up an ensemble to protect my system against crashes.
> The ec2 services has some differences to a normal server farm. E.g. the
> data saved on the file system of an ec2 instance is lost if the instance
> crashes. In the documentation of zookeeper, I have read that zookeeper saves
> snapshots of the in-memory data in the file system. Is that needed for
> recovery? Logically, it would be much easier for me if this is not the case.
> Additionally, ec2 brings the advantage that serves can be switch on and off
> dynamically dependent on the load, traffic, etc. Can this advantage be
> utilized for a zookeeper ensemble? Is it possible to add a zookeeper server
> dynamically to an ensemble? E.g. dependent on the in-memory load?
>
> David
>

Re: zookeeper on ec2

Posted by Evan Jones <ev...@MIT.EDU>.

On Jul 6, 2009, at 15:40 , Henry Robinson wrote:
> This is an interesting way of doing things. It seems like there is a
> correctness issue: if a majority of servers fail, with the remaining
> minority lagging the leader for some reason, won't the ensemble's  
> current
> state be forever lost? This is akin to a majority of servers failing  
> and
> never recovering. ZK relies on the eventual liveness of a majority  
> of its
> servers; with EC2 it seems possible that that property might not be
> satisfied.

I think you are absolutely correct. However, my understanding of EC2  
failure modes is that even though there is no guarantee that a  
particular instance's disk will survive a failure, it is very possible  
to observe EC2 nodes that "fail" temporarily (such as rebooting). In  
these cases, the instance's disk typically does survive, and when it  
comes back it will have the same contents. It is only "permanent" EC2  
failures where the disk is gone (eg. hardware failure, or Amazon  
decides to pull it for some other reason).

Thus, this looks a lot like running your own machines in your own data  
center to me. Soft failures will recover, hardware failures won't. The  
only difference is that if you were running the machines yourself, and  
you ran into some weird issue where you had hardware failures across a  
majority of your Zookeeper ensemble, you could physically move the  
disks to recover the state. If this happens in EC2, you will have to  
do some sort of "manual" repair where you forcibly restart Zookeeper  
using the state of one of the surviving members. Some Zookeeper  
operations may be lost in this case.

However, we are talking about a situation that seems exceedingly rare.  
No matter what kind of system you are running, serious non-recoverable  
failures will happen, so I don't see this to be an impediment for  
running Zookeeper or other quorum systems in EC2.

That said, I haven't run enough EC2 instances for a long enough period  
of time to observe any serious failures or recoveries. If anyone has  
more detailed information, I would love to hear about it.

Evan Jones

--
Evan Jones
http://evanjones.ca/

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

Henry Robinson wrote:
> Effectively, EC2 does not introduce any new failure modes but potentially
> exacerbates some existing ones. If a majority of EC2 nodes fail (in the
> sense that their hard drive images cannot be recovered), there is no way to
> restart the cluster, and persistence is lost. As you say, this is highly
> unlikely. If, for some reason, the quorums are set such that only a single
> node failure could bring down the quorum (bad design, but plausible), this
> failure is more likely.

This is not strictly true. The cluster cannot recover _automatically_ if 
failures > n, where ensemble size is 2n+1. However you can recover 
manually as long as at least 1 snap and trailing logs can be recovered. 
We can even recover if the latest snapshots are corrupted, as long as we 
can recover a snap from some previous time t and all logs subsequent to t.


> 
> EC2 just ups the stakes - crash failures are now potentially more dangerous
> (bugs, packet corruption, rack local hardware failures etc all could cause
> crash failures). It is common to assume that, notwithstanding a significant
> physical event that wipes a number of hard drives, writes that are written
> stay written. This assumption is sometimes false given certain choices of
> filesystem. EC2 just gives us a few more ways for that not to be true.
> 
> I think it's more possible than one might expect to have a lagging minority
> left behind - say they are partitioned from the majority by a malfunctioning
> switch. They might all be lagging already as a result. Care must be taken
> not to bring up another follower on the minority side to make it a majority,
> else there are split-brain issues as well as the possibility of lost
> transactions. Again, not *too* likely to happen in the wild, but these
> permanently running services have a nasty habit of exploring the edge
> cases...
> 
> 
>> To be explicit, you can cause any ZK cluster to back-track in time by doing
>> the following:
>>
> ...
> 
>> f) add new members of the cluster
> 
> 
> Which is why care needs to be taken that the ensemble can't be expanded with
> a current quorum. Dynamic membership doesn't save us when a majority fails -
> the existence of a quorum is a liveness condition for ZK. To help with the
> liveness issue we can sacrifice a little safety (see, e.g. vector clock
> ordered timestamps in Dynamo), but I think that ZK is aimed at safety first,
> liveness second. Not that you were advocating changing that, I'm just
> articulating why correctness is extremely important from my perspective.
> 
> Henry
> 
> 
>>
>> At this point, you will have lost the transactions from (b), but I really,
>> really am not going to worry about this happening either by plan or by
>> accident.  Without steps (e) and (f), the cluster will tell you that it
>> knows something is wrong and that it cannot elect a leader.  If you don't
>> have *exact* coincidence of the survivor set and the set of laggards, then
>> you won't have any data loss at all.
>>
>> You have to decide if this is too much risk for you.  My feeling is that it
>> is OK level of correctness for conventional weapon fire control, but not
>> for
>> nuclear weapons safeguards.  Since my apps are considerably less sensitive
>> than either of those, I am not much worried.
> 
> 
>> On Mon, Jul 6, 2009 at 12:40 PM, Henry Robinson <he...@cloudera.com>
>> wrote:
>>
>>> It seems like there is a
>>> correctness issue: if a majority of servers fail, with the remaining
>>> minority lagging the leader for some reason, won't the ensemble's current
>>> state be forever lost?
>>>
>

Re: zookeeper on ec2

Posted by Henry Robinson <he...@cloudera.com>.

On Mon, Jul 6, 2009 at 10:16 PM, Ted Dunning <te...@gmail.com> wrote:

> No.  This should not cause data loss.

> As soon as ZK cannot replicate changes to a majority of machines, it
> refuses
> to take any more changes.  This is old ground and is required for
> correctness in the face of network partition.  It is conceivable (barely)
> that *exactly* the minority that were behind were the survivors, but this
> is
> almost equivalent to a complete failure of the cluster choreographed in
> such
> a way that a few nodes come back from the dead just afterwards.  That could
> cause the state to not include some "completed" transactions to disappear,
> but at this level of massive failure, we have the same issues with any
> cluster.
>

Effectively, EC2 does not introduce any new failure modes but potentially
exacerbates some existing ones. If a majority of EC2 nodes fail (in the
sense that their hard drive images cannot be recovered), there is no way to
restart the cluster, and persistence is lost. As you say, this is highly
unlikely. If, for some reason, the quorums are set such that only a single
node failure could bring down the quorum (bad design, but plausible), this
failure is more likely.

EC2 just ups the stakes - crash failures are now potentially more dangerous
(bugs, packet corruption, rack local hardware failures etc all could cause
crash failures). It is common to assume that, notwithstanding a significant
physical event that wipes a number of hard drives, writes that are written
stay written. This assumption is sometimes false given certain choices of
filesystem. EC2 just gives us a few more ways for that not to be true.

I think it's more possible than one might expect to have a lagging minority
left behind - say they are partitioned from the majority by a malfunctioning
switch. They might all be lagging already as a result. Care must be taken
not to bring up another follower on the minority side to make it a majority,
else there are split-brain issues as well as the possibility of lost
transactions. Again, not *too* likely to happen in the wild, but these
permanently running services have a nasty habit of exploring the edge
cases...

>
> To be explicit, you can cause any ZK cluster to back-track in time by doing
> the following:
>
...

>
> f) add new members of the cluster

Which is why care needs to be taken that the ensemble can't be expanded with
a current quorum. Dynamic membership doesn't save us when a majority fails -
the existence of a quorum is a liveness condition for ZK. To help with the
liveness issue we can sacrifice a little safety (see, e.g. vector clock
ordered timestamps in Dynamo), but I think that ZK is aimed at safety first,
liveness second. Not that you were advocating changing that, I'm just
articulating why correctness is extremely important from my perspective.

Henry

>
>
> At this point, you will have lost the transactions from (b), but I really,
> really am not going to worry about this happening either by plan or by
> accident.  Without steps (e) and (f), the cluster will tell you that it
> knows something is wrong and that it cannot elect a leader.  If you don't
> have *exact* coincidence of the survivor set and the set of laggards, then
> you won't have any data loss at all.
>
> You have to decide if this is too much risk for you.  My feeling is that it
> is OK level of correctness for conventional weapon fire control, but not
> for
> nuclear weapons safeguards.  Since my apps are considerably less sensitive
> than either of those, I am not much worried.

>
> On Mon, Jul 6, 2009 at 12:40 PM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > It seems like there is a
> > correctness issue: if a majority of servers fail, with the remaining
> > minority lagging the leader for some reason, won't the ensemble's current
> > state be forever lost?
> >
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

No.  This should not cause data loss.

As soon as ZK cannot replicate changes to a majority of machines, it refuses
to take any more changes.  This is old ground and is required for
correctness in the face of network partition.  It is conceivable (barely)
that *exactly* the minority that were behind were the survivors, but this is
almost equivalent to a complete failure of the cluster choreographed in such
a way that a few nodes come back from the dead just afterwards.  That could
cause the state to not include some "completed" transactions to disappear,
but at this level of massive failure, we have the same issues with any
cluster.

To be explicit, you can cause any ZK cluster to back-track in time by doing
the following:

a) take down a minority of machines

b) do some updates

c) take down the rest of the cluster

d) bring back the minority

e) reconfigure to tell the minority that they are everything

f) add new members of the cluster

At this point, you will have lost the transactions from (b), but I really,
really am not going to worry about this happening either by plan or by
accident.  Without steps (e) and (f), the cluster will tell you that it
knows something is wrong and that it cannot elect a leader.  If you don't
have *exact* coincidence of the survivor set and the set of laggards, then
you won't have any data loss at all.

You have to decide if this is too much risk for you.  My feeling is that it
is OK level of correctness for conventional weapon fire control, but not for
nuclear weapons safeguards.  Since my apps are considerably less sensitive
than either of those, I am not much worried.

On Mon, Jul 6, 2009 at 12:40 PM, Henry Robinson <he...@cloudera.com> wrote:

> It seems like there is a
> correctness issue: if a majority of servers fail, with the remaining
> minority lagging the leader for some reason, won't the ensemble's current
> state be forever lost?
>

Re: zookeeper on ec2

Posted by Henry Robinson <he...@cloudera.com>.

On Mon, Jul 6, 2009 at 7:38 PM, Ted Dunning <te...@gmail.com> wrote:

>
> I think that the misunderstanding is that this on-disk image is critical to
> cluster function.  It is not critical because it is replicated to all
> cluster members.  This means that any member can disappear and a new
> instance can replace it with no big cost other than the temporary load of
> copying the current snapshot from some cluster member.
>

This is an interesting way of doing things. It seems like there is a
correctness issue: if a majority of servers fail, with the remaining
minority lagging the leader for some reason, won't the ensemble's current
state be forever lost? This is akin to a majority of servers failing and
never recovering. ZK relies on the eventual liveness of a majority of its
servers; with EC2 it seems possible that that property might not be
satisfied.

(For majority, you can read 'quorum' under the flexible quorums scheme;
perhaps there is a way to devise a quorum scheme suitable for elastic
computing...)

Henry

>
> On Mon, Jul 6, 2009 at 11:33 AM, Mahadev Konar <mahadev@yahoo-inc.com
> >wrote:
>
> >  In the documentation of zookeeper, I have read that
> > > zookeeper saves snapshots of the in-memory data in the file system. Is
> > > that needed for recovery? Logically, it would be much easier for me if
> > > this is not the case.
> > Yes, zookeeper keeps persistent state on disk. This is used for recovery
> > and
> > correctness of zookeeper.
>

Re: zookeeper on ec2

Posted by Patrick Hunt <ph...@apache.org>.

Ted thanks for the info.

I've created a wiki page
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperOnEC2
to capture details of running ZK on EC2. If you or anyone else would 
like to update it with information please do so.

Regards,

Patrick

Ted Dunning wrote:
> I disagree with the original post that this is a problem, even in EC2.
> Having the persistent copy on disk is exactly what makes the rolling restart
> work so well.
> 
> I think that the misunderstanding is that this on-disk image is critical to
> cluster function.  It is not critical because it is replicated to all
> cluster members.  This means that any member can disappear and a new
> instance can replace it with no big cost other than the temporary load of
> copying the current snapshot from some cluster member.
> 
> On Mon, Jul 6, 2009 at 11:33 AM, Mahadev Konar <ma...@yahoo-inc.com>wrote:
> 
>>  In the documentation of zookeeper, I have read that
>>> zookeeper saves snapshots of the in-memory data in the file system. Is
>>> that needed for recovery? Logically, it would be much easier for me if
>>> this is not the case.
>> Yes, zookeeper keeps persistent state on disk. This is used for recovery
>> and
>> correctness of zookeeper.
>

Re: zookeeper on ec2

Posted by Ted Dunning <te...@gmail.com>.

I disagree with the original post that this is a problem, even in EC2.
Having the persistent copy on disk is exactly what makes the rolling restart
work so well.

I think that the misunderstanding is that this on-disk image is critical to
cluster function.  It is not critical because it is replicated to all
cluster members.  This means that any member can disappear and a new
instance can replace it with no big cost other than the temporary load of
copying the current snapshot from some cluster member.

On Mon, Jul 6, 2009 at 11:33 AM, Mahadev Konar <ma...@yahoo-inc.com>wrote:

>  In the documentation of zookeeper, I have read that
> > zookeeper saves snapshots of the in-memory data in the file system. Is
> > that needed for recovery? Logically, it would be much easier for me if
> > this is not the case.
> Yes, zookeeper keeps persistent state on disk. This is used for recovery
> and
> correctness of zookeeper.

Re: zookeeper on ec2

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi David,
 Answers in line:


On 7/6/09 4:45 AM, "David Graf" <da...@28msec.com> wrote:

> Hello
> 
> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> system, zookeeper is used to run a locking service and to generate
> unique id's. Currently, for testing purposes, I am only running one
> instance. Now, I need to set up an ensemble to protect my system
> against crashes.

> The ec2 services has some differences to a normal server farm. E.g.
> the data saved on the file system of an ec2 instance is lost if the
> instance crashes. In the documentation of zookeeper, I have read that
> zookeeper saves snapshots of the in-memory data in the file system. Is
> that needed for recovery? Logically, it would be much easier for me if
> this is not the case.
Yes, zookeeper keeps persistent state on disk. This is used for recovery and
correctness of zookeeper.

> Additionally, ec2 brings the advantage that serves can be switch on
> and off dynamically dependent on the load, traffic, etc. Can this
> advantage be utilized for a zookeeper ensemble? Is it possible to add
> a zookeeper server dynamically to an ensemble? E.g. dependent on the
> in-memory load?
It is not yet possible to add servers dynamically. There is work going on to
to that on http://issues.apache.org/jira/browse/ZOOKEEPER-107. This should
get into the next release (am hoping). For now, you will have to do a
rolling restart if you do not want the services to do down or else restart
all the machines at the same time (the zookeeper cleints should be able to
handle a minor downtime of zookeeper service).

Thanks
mahadev
> 
> David