You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark Kerzner <ma...@gmail.com> on 2011/12/30 05:20:46 UTC

Could an EC2 machine to 4 times slower than local dev workstation?

Hi,

I am running a small program to load about 1 million rows into HBase. It
takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
machine. Both are running the same version of Ubuntu and the same version
of HBase. Everything is local on one machine in both cases.

What could the difference between the two environments be? I did notice
that my local machine has higher CPU loads:

hbase 64%
java (my app) 38%
hdfs 20%

whereas the EC2 machine
hbase 47%
java (my app) 23%
hdfs 14%


Sincerely,
Mark

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Doug Meil <do...@explorysmedical.com>.
For the record, what Andrew/Li said is pretty much the standard disclaimer
in the Performance chapter for EC2.  It's a separate class of performance
problem.

http://hbase.apache.org/book.html#perf.ec2






On 5/15/12 8:04 PM, "Andrew Purtell" <ap...@apache.org> wrote:

>It's not just a matter of having neighbors, and anyway > 0 neighbors
>is a performance problem. You'll note in the numbers below that the
>local machine had higher CPU use. I expect this was because it was
>getting more work done given the lower latency and higher throughput
>of non-virtualized IO.
>
>On Tue, May 15, 2012 at 4:45 PM, S Ahmed <sa...@gmail.com> wrote:
>> any ideas how many c1.mediums might be on a given physical server?
>>(rough
>> ideas...)
>>
>> On Fri, Dec 30, 2011 at 12:36 AM, Li Pi <li...@idle.li> wrote:
>>
>>> Yup. Virtualized IO pretty much explains it.
>>>
>>> On Thu, Dec 29, 2011 at 8:20 PM, Mark Kerzner <ma...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am running a small program to load about 1 million rows into
>>>HBase. It
>>> > takes 200 seconds on my dev machine, and 800 seconds on a c1.medium
>>>EC2
>>> > machine. Both are running the same version of Ubuntu and the same
>>>version
>>> > of HBase. Everything is local on one machine in both cases.
>>> >
>>> > What could the difference between the two environments be? I did
>>>notice
>>> > that my local machine has higher CPU loads:
>>> >
>>> > hbase 64%
>>> > java (my app) 38%
>>> > hdfs 20%
>>> >
>>> > whereas the EC2 machine
>>> > hbase 47%
>>> > java (my app) 23%
>>> > hdfs 14%
>>> >
>>> >
>>> > Sincerely,
>>> > Mark
>>>
>



Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Andrew Purtell <ap...@apache.org>.
It's not just a matter of having neighbors, and anyway > 0 neighbors
is a performance problem. You'll note in the numbers below that the
local machine had higher CPU use. I expect this was because it was
getting more work done given the lower latency and higher throughput
of non-virtualized IO.

On Tue, May 15, 2012 at 4:45 PM, S Ahmed <sa...@gmail.com> wrote:
> any ideas how many c1.mediums might be on a given physical server? (rough
> ideas...)
>
> On Fri, Dec 30, 2011 at 12:36 AM, Li Pi <li...@idle.li> wrote:
>
>> Yup. Virtualized IO pretty much explains it.
>>
>> On Thu, Dec 29, 2011 at 8:20 PM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am running a small program to load about 1 million rows into HBase. It
>> > takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
>> > machine. Both are running the same version of Ubuntu and the same version
>> > of HBase. Everything is local on one machine in both cases.
>> >
>> > What could the difference between the two environments be? I did notice
>> > that my local machine has higher CPU loads:
>> >
>> > hbase 64%
>> > java (my app) 38%
>> > hdfs 20%
>> >
>> > whereas the EC2 machine
>> > hbase 47%
>> > java (my app) 23%
>> > hdfs 14%
>> >
>> >
>> > Sincerely,
>> > Mark
>>

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by S Ahmed <sa...@gmail.com>.
any ideas how many c1.mediums might be on a given physical server? (rough
ideas...)

On Fri, Dec 30, 2011 at 12:36 AM, Li Pi <li...@idle.li> wrote:

> Yup. Virtualized IO pretty much explains it.
>
> On Thu, Dec 29, 2011 at 8:20 PM, Mark Kerzner <ma...@gmail.com>
> wrote:
> > Hi,
> >
> > I am running a small program to load about 1 million rows into HBase. It
> > takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
> > machine. Both are running the same version of Ubuntu and the same version
> > of HBase. Everything is local on one machine in both cases.
> >
> > What could the difference between the two environments be? I did notice
> > that my local machine has higher CPU loads:
> >
> > hbase 64%
> > java (my app) 38%
> > hdfs 20%
> >
> > whereas the EC2 machine
> > hbase 47%
> > java (my app) 23%
> > hdfs 14%
> >
> >
> > Sincerely,
> > Mark
>

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Li Pi <li...@idle.li>.
Yup. Virtualized IO pretty much explains it.

On Thu, Dec 29, 2011 at 8:20 PM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi,
>
> I am running a small program to load about 1 million rows into HBase. It
> takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
> machine. Both are running the same version of Ubuntu and the same version
> of HBase. Everything is local on one machine in both cases.
>
> What could the difference between the two environments be? I did notice
> that my local machine has higher CPU loads:
>
> hbase 64%
> java (my app) 38%
> hdfs 20%
>
> whereas the EC2 machine
> hbase 47%
> java (my app) 23%
> hdfs 14%
>
>
> Sincerely,
> Mark

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Michel Segel <mi...@hotmail.com>.
Hi,
Yes the performance hit is normal.
Looks like you're seeing network latency on disk I/O.
Could also be a tuning issue. (differences in configurations...)

Not sure how much. CPU difference will impact performance, while disk I/O will really kill you.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 30, 2011, at 11:33 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Thank you, Bryan,
> 
> that is very important and clear some cloudiness in my mind.
> 
> Sincerely,
> Mark
> 
> On Fri, Dec 30, 2011 at 10:54 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
> 
>> We have also seen this in our testing, though we focused mainly on MR more
>> than HBase.
>> 
>> Keep in mind that EC2 Compute Units are defined as follows:
>> 
>> The amount of CPU that is allocated to a particular instance is expressed
>>> in terms of these EC2 Compute Units. We use several benchmarks and tests
>> to
>>> manage the consistency and predictability of the performance of an EC2
>>> Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity
>> of
>>> a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
>> 
>> 
>> This does not even account for CPU contention that Amandeep mentioned,
>> which we have noticed at times as well.  Also, c1.mediums have a I/O
>> Performance rating of "Moderate."  I think this mainly refers to ethernet
>> speed, but it could refer to disk speed as well.
>> 
>> If your local workstation is a reasonably modern system, it is very
>> possible for you to see much better performance locally.  The difference
>> between 2.5 1.0 GHz 2007 processors (2.5 compute units) and a modern i5,
>> i7, or equivalent is huge not just in speed and number of cores, but
>> architecture, cache, etc.  In terms of HBase write speed, if you are
>> running on an SSD this could cause a substantial gap as well.
>> 
>> On Fri, Dec 30, 2011 at 12:38 AM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> 
>>> Is your client program running on the same node? Given that c1.mediums
>> are
>>> on shared hosts, your neighbor might be overloading his VM, causing yours
>>> to starve.
>>> 
>>> On Fri, Dec 30, 2011 at 9:50 AM, Mark Kerzner <ma...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am running a small program to load about 1 million rows into HBase.
>> It
>>>> takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
>>>> machine. Both are running the same version of Ubuntu and the same
>> version
>>>> of HBase. Everything is local on one machine in both cases.
>>>> 
>>>> What could the difference between the two environments be? I did notice
>>>> that my local machine has higher CPU loads:
>>>> 
>>>> hbase 64%
>>>> java (my app) 38%
>>>> hdfs 20%
>>>> 
>>>> whereas the EC2 machine
>>>> hbase 47%
>>>> java (my app) 23%
>>>> hdfs 14%
>>>> 
>>>> 
>>>> Sincerely,
>>>> Mark
>>>> 
>>> 
>> 

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Mark Kerzner <ma...@shmsoft.com>.
Thank you, Bryan,

that is very important and clear some cloudiness in my mind.

Sincerely,
Mark

On Fri, Dec 30, 2011 at 10:54 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> We have also seen this in our testing, though we focused mainly on MR more
> than HBase.
>
> Keep in mind that EC2 Compute Units are defined as follows:
>
> The amount of CPU that is allocated to a particular instance is expressed
> > in terms of these EC2 Compute Units. We use several benchmarks and tests
> to
> > manage the consistency and predictability of the performance of an EC2
> > Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity
> of
> > a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
>
>
> This does not even account for CPU contention that Amandeep mentioned,
> which we have noticed at times as well.  Also, c1.mediums have a I/O
> Performance rating of "Moderate."  I think this mainly refers to ethernet
> speed, but it could refer to disk speed as well.
>
> If your local workstation is a reasonably modern system, it is very
> possible for you to see much better performance locally.  The difference
> between 2.5 1.0 GHz 2007 processors (2.5 compute units) and a modern i5,
> i7, or equivalent is huge not just in speed and number of cores, but
> architecture, cache, etc.  In terms of HBase write speed, if you are
> running on an SSD this could cause a substantial gap as well.
>
> On Fri, Dec 30, 2011 at 12:38 AM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > Is your client program running on the same node? Given that c1.mediums
> are
> > on shared hosts, your neighbor might be overloading his VM, causing yours
> > to starve.
> >
> > On Fri, Dec 30, 2011 at 9:50 AM, Mark Kerzner <ma...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am running a small program to load about 1 million rows into HBase.
> It
> > > takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
> > > machine. Both are running the same version of Ubuntu and the same
> version
> > > of HBase. Everything is local on one machine in both cases.
> > >
> > > What could the difference between the two environments be? I did notice
> > > that my local machine has higher CPU loads:
> > >
> > > hbase 64%
> > > java (my app) 38%
> > > hdfs 20%
> > >
> > > whereas the EC2 machine
> > > hbase 47%
> > > java (my app) 23%
> > > hdfs 14%
> > >
> > >
> > > Sincerely,
> > > Mark
> > >
> >
>

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
We have also seen this in our testing, though we focused mainly on MR more
than HBase.

Keep in mind that EC2 Compute Units are defined as follows:

The amount of CPU that is allocated to a particular instance is expressed
> in terms of these EC2 Compute Units. We use several benchmarks and tests to
> manage the consistency and predictability of the performance of an EC2
> Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity of
> a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.


This does not even account for CPU contention that Amandeep mentioned,
which we have noticed at times as well.  Also, c1.mediums have a I/O
Performance rating of "Moderate."  I think this mainly refers to ethernet
speed, but it could refer to disk speed as well.

If your local workstation is a reasonably modern system, it is very
possible for you to see much better performance locally.  The difference
between 2.5 1.0 GHz 2007 processors (2.5 compute units) and a modern i5,
i7, or equivalent is huge not just in speed and number of cores, but
architecture, cache, etc.  In terms of HBase write speed, if you are
running on an SSD this could cause a substantial gap as well.

On Fri, Dec 30, 2011 at 12:38 AM, Amandeep Khurana <am...@gmail.com> wrote:

> Is your client program running on the same node? Given that c1.mediums are
> on shared hosts, your neighbor might be overloading his VM, causing yours
> to starve.
>
> On Fri, Dec 30, 2011 at 9:50 AM, Mark Kerzner <ma...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am running a small program to load about 1 million rows into HBase. It
> > takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
> > machine. Both are running the same version of Ubuntu and the same version
> > of HBase. Everything is local on one machine in both cases.
> >
> > What could the difference between the two environments be? I did notice
> > that my local machine has higher CPU loads:
> >
> > hbase 64%
> > java (my app) 38%
> > hdfs 20%
> >
> > whereas the EC2 machine
> > hbase 47%
> > java (my app) 23%
> > hdfs 14%
> >
> >
> > Sincerely,
> > Mark
> >
>

Re: Could an EC2 machine to 4 times slower than local dev workstation?

Posted by Amandeep Khurana <am...@gmail.com>.
Is your client program running on the same node? Given that c1.mediums are
on shared hosts, your neighbor might be overloading his VM, causing yours
to starve.

On Fri, Dec 30, 2011 at 9:50 AM, Mark Kerzner <ma...@gmail.com> wrote:

> Hi,
>
> I am running a small program to load about 1 million rows into HBase. It
> takes 200 seconds on my dev machine, and 800 seconds on a c1.medium EC2
> machine. Both are running the same version of Ubuntu and the same version
> of HBase. Everything is local on one machine in both cases.
>
> What could the difference between the two environments be? I did notice
> that my local machine has higher CPU loads:
>
> hbase 64%
> java (my app) 38%
> hdfs 20%
>
> whereas the EC2 machine
> hbase 47%
> java (my app) 23%
> hdfs 14%
>
>
> Sincerely,
> Mark
>