You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@whirr.apache.org by Fuad Efendi <fu...@efendi.ca> on 2011/08/24 22:16:25 UTC

EC2 Experience

Any luck? I am having exceptionally bad (ugly) problems with x1.large
instances (7.5Gb), Ubuntu, backed by EBS, Hbase + Hadoop (3 nodes) initially
created by Whirr 0.6 (trunk, 2 months ago)


I suspect it is "virtualization", servers unpredictably stop on 5-10-15
minutes, I even increased ZooKeeper timeouts to 20 minutes but it doesn't
help


Virtualization related?


GC shows user time ZERO, system time ZERO, and real time 600 seconds (just
as a sample); in most cases (99.99%) all timings are zero (so it's tuned to
the best)




-- 
Fuad Efendi
416-993-2060
http://www.tokenizer.ca

Re: EC2 Experience

Posted by Andrew Purtell <ap...@apache.org>.

>  Any luck? I am having exceptionally bad (ugly) problems with x1.large
>  instances (7.5Gb), Ubuntu, backed by EBS, Hbase + Hadoop (3 nodes) 
> initially created by Whirr 0.6 (trunk, 2 months ago)

Let me start by saying that I use EC2 to host Hadoop+HBase clusters often, but for testing purposes. My clusters typically don't live longer than 72 hours.

You mean m1.large?If so, you should not use it, you can only burst CPU a little bit before the hypervisor will begin to steal back like ~70% of your CPU time. I recommend using c1.xlarge, m2.2xlarge, or m2.4xlarge, depending on how much RAM you need.

Don't use EBS. Hadoop can take advantage of all of the attached instance-store volumes -- configure the DataNodes to stripe block storage over all of them. EBS is an unwarranted expense, performance problem*, and point of failure for running Hadoop clusters up on EC2. (And you should even consider if this makes sense. Economically it does not. It is far more economical to lease servers from e.g. SoftLayer. See Rod Cope's slides from Hadoop World or Berlin Buzzwords about Hadoop/HBase for cost comparison.) Sure, instance-store is ephemeral, so if you lose an instance you lose its volumes, but HDFS replication is designed exactly to preserve data availability among a collection of unreliable servers.

* -- I see blog posts about needing to put ~10 EBS volumes into a RAID array to get decent performance with Postgres or MySQL, about high variability in performance, post mortems about extended EBS outages in availability zones that last for many hours if not days, etc. I don't use EBS and I do not recommend its use, especially for Hadoop clusters, that do not need it.

Best regards,

       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

----- Original Message -----
> From: Fuad Efendi <fu...@efendi.ca>
> To: whirr-user@incubator.apache.org
> Cc: 
> Sent: Thursday, August 25, 2011 12:59 PM
> Subject: Re: EC2 Experience
> 
> It stops unpredictably, few times a week
> Sent on the TELUS Mobility network with BlackBerry
> 
> -----Original Message-----
> From: Andrei Savu <sa...@gmail.com>
> Date: Wed, 24 Aug 2011 19:18:10 
> To: <wh...@incubator.apache.org>
> Reply-To: whirr-user@incubator.apache.org
> Subject: Re: EC2 Experience
> 
> I'm not sure if I understand. So you are starting a HBase cluster with
> Whirr and after 5-15 minutes the region servers stop?
> 
> As far as I know we haven't done any testing with clusters running
> more that a few minutes.
> 
> Is there anything relevant in the HBase / Hadoop / ZooKeeper log files?
> 
> -- Andrei Savu / andreisavu.ro
> 
> 
> On Wed, Aug 24, 2011 at 1:16 PM, Fuad Efendi <fu...@efendi.ca> wrote:
>>  Any luck? I am having exceptionally bad (ugly) problems with x1.large
>>  instances (7.5Gb), Ubuntu, backed by EBS, Hbase + Hadoop (3 nodes) 
> initially
>>  created by Whirr 0.6 (trunk, 2 months ago)
>> 
>>  I suspect it is "virtualization", servers unpredictably stop on  5-10-15
>>  minutes, I even increased ZooKeeper timeouts to 20 minutes but it 
> doesn't
>>  help…
>> 
>>  Virtualization related?
>> 
>>  GC shows user time ZERO, system time ZERO, and real time 600 seconds (just
>>  as a sample); in most cases (99.99%) all timings are zero (so it's 
> tuned to
>>  the best)
>> 
>>  --
>> 
>>  Fuad Efendi
>> 
>>  416-993-2060
>> 
>>  http://www.tokenizer.ca 
>> 
>> 
>

Re: EC2 Experience

Posted by Fuad Efendi <fu...@efendi.ca>.

It stops unpredictably, few times a week
Sent on the TELUS Mobility network with BlackBerry

-----Original Message-----
From: Andrei Savu <sa...@gmail.com>
Date: Wed, 24 Aug 2011 19:18:10 
To: <wh...@incubator.apache.org>
Reply-To: whirr-user@incubator.apache.org
Subject: Re: EC2 Experience

I'm not sure if I understand. So you are starting a HBase cluster with
Whirr and after 5-15 minutes the region servers stop?

As far as I know we haven't done any testing with clusters running
more that a few minutes.

Is there anything relevant in the HBase / Hadoop / ZooKeeper log files?

-- Andrei Savu / andreisavu.ro


On Wed, Aug 24, 2011 at 1:16 PM, Fuad Efendi <fu...@efendi.ca> wrote:
> Any luck? I am having exceptionally bad (ugly) problems with x1.large
> instances (7.5Gb), Ubuntu, backed by EBS, Hbase + Hadoop (3 nodes) initially
> created by Whirr 0.6 (trunk, 2 months ago)
>
> I suspect it is "virtualization", servers unpredictably stop on 5-10-15
> minutes, I even increased ZooKeeper timeouts to 20 minutes but it doesn't
> help…
>
> Virtualization related?
>
> GC shows user time ZERO, system time ZERO, and real time 600 seconds (just
> as a sample); in most cases (99.99%) all timings are zero (so it's tuned to
> the best)
>
> --
>
> Fuad Efendi
>
> 416-993-2060
>
> http://www.tokenizer.ca
>
>

Re: EC2 Experience

Posted by Andrei Savu <sa...@gmail.com>.

I'm not sure if I understand. So you are starting a HBase cluster with
Whirr and after 5-15 minutes the region servers stop?

As far as I know we haven't done any testing with clusters running
more that a few minutes.

Is there anything relevant in the HBase / Hadoop / ZooKeeper log files?

-- Andrei Savu / andreisavu.ro


On Wed, Aug 24, 2011 at 1:16 PM, Fuad Efendi <fu...@efendi.ca> wrote:
> Any luck? I am having exceptionally bad (ugly) problems with x1.large
> instances (7.5Gb), Ubuntu, backed by EBS, Hbase + Hadoop (3 nodes) initially
> created by Whirr 0.6 (trunk, 2 months ago)
>
> I suspect it is "virtualization", servers unpredictably stop on 5-10-15
> minutes, I even increased ZooKeeper timeouts to 20 minutes but it doesn't
> help…
>
> Virtualization related?
>
> GC shows user time ZERO, system time ZERO, and real time 600 seconds (just
> as a sample); in most cases (99.99%) all timings are zero (so it's tuned to
> the best)
>
> --
>
> Fuad Efendi
>
> 416-993-2060
>
> http://www.tokenizer.ca
>
>