You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tim Robertson <ti...@gmail.com> on 2012/02/01 13:51:39 UTC

PerformanceEvaluation results

Hi all,

We have a 3 node cluster (CD3u2) with the following hardware:

RegionServers (+DN + TT)
  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
  Disks: 6x250G SATA 5.4K
  Memory: 24GB

Master (+ZK, JT, NN)
  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
  Disks: 2x500G SATA 7.2K
  Memory: 8GB

Memory wise, we have:
Master:
  NN: 1GB
  JT: 1GB
  HBase master: 6GB
  ZK: 1GB
RegionServers:
  RegionServer: 6GB
  TaskTracker: 1GB
  11 Mappers @ 1GB each
  7 Reducers @ 1GB each

HDFS was empty, and I ran randomWrite and scan both with number
clients of 50 (seemed to spawn 500 Mappers though...)

randomWrite:
12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886

scan:
12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664

Would I be correct in thinking that this is way below what is to be
expected of this hardware?
We're setting up ganglia now to start debugging, but any suggestions
on how to diagnose this would be greatly appreciated.

Thanks!
Tim

Re: PerformanceEvaluation results

Posted by Tim Robertson <ti...@gmail.com>.

Hey Stack,

Because we run a couple clusters now, we're using templating for the
*.site.xml etc.

You'll find them in:
  http://code.google.com/p/gbif-common-resources/source/browse/cluster-puppet/modules/hadoop/templates/

The values for the HBase 3 node cluster come from:
  http://code.google.com/p/gbif-common-resources/source/browse/cluster-puppet/manifests/cluster2.pp

Thanks for looking - really appreciate some expert eyes on this!

Tim




On Tue, Feb 7, 2012 at 11:39 PM, Stack <st...@duboce.net> wrote:
> On Tue, Feb 7, 2012 at 3:27 AM, Lars Francke <la...@gmail.com> wrote:
>> [1] <http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet>
>
> I don't see your hbase-site.xml up here Lars.  Am I looking in the wrong place?
>
> Good on you,
>
> St.Ack

Re: PerformanceEvaluation results

Posted by Stack <st...@duboce.net>.

On Tue, Feb 7, 2012 at 3:27 AM, Lars Francke <la...@gmail.com> wrote:
> [1] <http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet>

I don't see your hbase-site.xml up here Lars.  Am I looking in the wrong place?

Good on you,

St.Ack

Re: PerformanceEvaluation results

Posted by Lars Francke <la...@gmail.com>.

Hi Stack, Hi everyone,

>> I do feel the HBase project would benefit from some example metrics
>> for various operations and hardware or else it will remain a difficult
>> technology for some people to get into with confidence.  We'll blog
>> our findings, and hopefully it might be of benefit to other
>> leprechauns.  If we can prove the concept, we're more likely to be
>> able to get $ to grow.
>
> Agree (except for the bit where you look like a leprechaun).  Would be
> cool if folks published what stats they see doing various operations
> in hbase on a specific hardware.  Previous I'd have thought the
> deploys, configs., etc., too various but I suppose you have to start
> somewhere.

I too agree.

>From my experience there are a lot of small companies[4] which can't
afford or need large clusters and don't have the knowledge and
resources to fully optimize a cluster. We're certainly one of those
organizations. It's already a challenge for us to follow the rapid
development in the projects we're using (Hadoop, HBase, Oozie, Hive,
etc.). We're still putting Hadoop and HBase to good use and it's
tremendously helpful.

As all our work is Open Source we're in the very fortunate position to
being able to point to all our configs[1], workflows[2] and metrics
(Ganglia now up and public)[3] etc. and ask for recommendations based
on that but a lot of other companies don't enjoy that privilege. We're
more than willing to provide information and even test out different
configurations on our (admittedly small and aging) cluster and we
would hope that this'll prove helpful for others as well.

It is worth noting that we do plan to buy new and better hardware, but
need to understand the technologies and capabilities to make some
informed choices before spending our total yearly hardware budget.
Therefore, understanding the behavior even on lesser quality hardware
is still important for us.

Thanks for all the past and (hopefully) future help and it's great to
finally be able to work with HBase again.

Cheers,
Lars

PS: Tim and I work at the same organization

[1] <http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet>
[2] <http://code.google.com/p/gbif-occurrencestore/source/browse/#svn%2Ftrunk%2Foozie-apps%2Frollover>
[3] <http://dev.gbif.org/ganglia/>
[4] See also the cluster sizes on <http://wiki.apache.org/hadoop/PoweredBy>

Re: PerformanceEvaluation results

Posted by Stack <st...@duboce.net>.

On Thu, Feb 2, 2012 at 8:00 AM, Tim Robertson <ti...@gmail.com> wrote:
> I do feel the HBase project would benefit from some example metrics
> for various operations and hardware or else it will remain a difficult
> technology for some people to get into with confidence.  We'll blog
> our findings, and hopefully it might be of benefit to other
> leprechauns.  If we can prove the concept, we're more likely to be
> able to get $ to grow.
>
>

Agree (except for the bit where you look like a leprechaun).  Would be
cool if folks published what stats they see doing various operations
in hbase on a specific hardware.  Previous I'd have thought the
deploys, configs., etc., too various but I suppose you have to start
somewhere.

Go easy Tim,
St.Ack

Re: PerformanceEvaluation results

Posted by Tim Robertson <ti...@gmail.com>.

Thanks all for the comments.  Ganglia set up is in progress.  We'll
keep plugging away.

I should mention that this is our first real dev cluster for
evaluation, and production would likely be more like a 6-7+ node
cluster of better machines, but for sure we are the small fry
leprechauns Ted Dunning refers to in his presentations - we're trying
to understand the potential and do some cost calculations before
buying hardware.

I do feel the HBase project would benefit from some example metrics
for various operations and hardware or else it will remain a difficult
technology for some people to get into with confidence.  We'll blog
our findings, and hopefully it might be of benefit to other
leprechauns.  If we can prove the concept, we're more likely to be
able to get $ to grow.




On Thu, Feb 2, 2012 at 5:24 AM, Michel Segel <mi...@hotmail.com> wrote:
> Tim,
>
> Here's the problem in a nutshell,
> With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
> Small slow drives, and still  a ratio less than one when you compare drives to spindles.
>
> I appreciate that you want to maximize performance, but when it comes to tuning, you have to start before you get your hardware.
>
>  You are asking a question about tuning, but how can we answer if the numbers are ok?
> Have you looked at your GCs and implemented mslabs? We don't know. Network configuration?
>
> I mean that there's a lot missing and fine tuning a cluster is something you have to do on your own. I guess I could say your numbers look fine to me for that config... But honestly, it would be a swag.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 1, 2012, at 7:09 AM, Tim Robertson <ti...@gmail.com> wrote:
>
>> Thanks Michael,
>>
>> It's a small cluster, but is the hardware so bad?  We are particularly
>> interested in relatively low load for random read write (2000
>> transactions per second on <1k rows) but a decent full table scan
>> speed, as we aim to mount Hive tables on HBase backed tables.
>>
>> Regarding tuning... not exactly sure which you would be interested in
>> seeing.  The config is all here:
>> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
>>
>> Cheers,
>> Tim
>>
>>
>>
>> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <mi...@hotmail.com> wrote:
>>> No.
>>> What tuning did you do?
>>> Why such a small cluster?
>>>
>>> Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <ti...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>>>
>>>> RegionServers (+DN + TT)
>>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>>  Disks: 6x250G SATA 5.4K
>>>>  Memory: 24GB
>>>>
>>>> Master (+ZK, JT, NN)
>>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>>  Disks: 2x500G SATA 7.2K
>>>>  Memory: 8GB
>>>>
>>>> Memory wise, we have:
>>>> Master:
>>>>  NN: 1GB
>>>>  JT: 1GB
>>>>  HBase master: 6GB
>>>>  ZK: 1GB
>>>> RegionServers:
>>>>  RegionServer: 6GB
>>>>  TaskTracker: 1GB
>>>>  11 Mappers @ 1GB each
>>>>  7 Reducers @ 1GB each
>>>>
>>>> HDFS was empty, and I ran randomWrite and scan both with number
>>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>>>
>>>> randomWrite:
>>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>>>
>>>> scan:
>>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>>>
>>>> Would I be correct in thinking that this is way below what is to be
>>>> expected of this hardware?
>>>> We're setting up ganglia now to start debugging, but any suggestions
>>>> on how to diagnose this would be greatly appreciated.
>>>>
>>>> Thanks!
>>>> Tim
>>

Re: PerformanceEvaluation results

Posted by Michel Segel <mi...@hotmail.com>.

Tim,

Here's the problem in a nutshell, 
With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
Small slow drives, and still  a ratio less than one when you compare drives to spindles.

I appreciate that you want to maximize performance, but when it comes to tuning, you have to start before you get your hardware. 

 You are asking a question about tuning, but how can we answer if the numbers are ok?
Have you looked at your GCs and implemented mslabs? We don't know. Network configuration?

I mean that there's a lot missing and fine tuning a cluster is something you have to do on your own. I guess I could say your numbers look fine to me for that config... But honestly, it would be a swag.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 1, 2012, at 7:09 AM, Tim Robertson <ti...@gmail.com> wrote:

> Thanks Michael,
> 
> It's a small cluster, but is the hardware so bad?  We are particularly
> interested in relatively low load for random read write (2000
> transactions per second on <1k rows) but a decent full table scan
> speed, as we aim to mount Hive tables on HBase backed tables.
> 
> Regarding tuning... not exactly sure which you would be interested in
> seeing.  The config is all here:
> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
> 
> Cheers,
> Tim
> 
> 
> 
> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <mi...@hotmail.com> wrote:
>> No.
>> What tuning did you do?
>> Why such a small cluster?
>> 
>> Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <ti...@gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>> 
>>> RegionServers (+DN + TT)
>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>  Disks: 6x250G SATA 5.4K
>>>  Memory: 24GB
>>> 
>>> Master (+ZK, JT, NN)
>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>  Disks: 2x500G SATA 7.2K
>>>  Memory: 8GB
>>> 
>>> Memory wise, we have:
>>> Master:
>>>  NN: 1GB
>>>  JT: 1GB
>>>  HBase master: 6GB
>>>  ZK: 1GB
>>> RegionServers:
>>>  RegionServer: 6GB
>>>  TaskTracker: 1GB
>>>  11 Mappers @ 1GB each
>>>  7 Reducers @ 1GB each
>>> 
>>> HDFS was empty, and I ran randomWrite and scan both with number
>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>> 
>>> randomWrite:
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>> 
>>> scan:
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>> 
>>> Would I be correct in thinking that this is way below what is to be
>>> expected of this hardware?
>>> We're setting up ganglia now to start debugging, but any suggestions
>>> on how to diagnose this would be greatly appreciated.
>>> 
>>> Thanks!
>>> Tim
>

Re: PerformanceEvaluation results

Posted by Tim Robertson <ti...@gmail.com>.

Thanks Michael,

It's a small cluster, but is the hardware so bad?  We are particularly
interested in relatively low load for random read write (2000
transactions per second on <1k rows) but a decent full table scan
speed, as we aim to mount Hive tables on HBase backed tables.

Regarding tuning... not exactly sure which you would be interested in
seeing.  The config is all here:
http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates

Cheers,
Tim



On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <mi...@hotmail.com> wrote:
> No.
> What tuning did you do?
> Why such a small cluster?
>
> Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.
>
>
>
> Sent from my iPhone
>
> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <ti...@gmail.com> wrote:
>
>> Hi all,
>>
>> We have a 3 node cluster (CD3u2) with the following hardware:
>>
>> RegionServers (+DN + TT)
>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>  Disks: 6x250G SATA 5.4K
>>  Memory: 24GB
>>
>> Master (+ZK, JT, NN)
>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>  Disks: 2x500G SATA 7.2K
>>  Memory: 8GB
>>
>> Memory wise, we have:
>> Master:
>>  NN: 1GB
>>  JT: 1GB
>>  HBase master: 6GB
>>  ZK: 1GB
>> RegionServers:
>>  RegionServer: 6GB
>>  TaskTracker: 1GB
>>  11 Mappers @ 1GB each
>>  7 Reducers @ 1GB each
>>
>> HDFS was empty, and I ran randomWrite and scan both with number
>> clients of 50 (seemed to spawn 500 Mappers though...)
>>
>> randomWrite:
>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>
>> scan:
>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>
>> Would I be correct in thinking that this is way below what is to be
>> expected of this hardware?
>> We're setting up ganglia now to start debugging, but any suggestions
>> on how to diagnose this would be greatly appreciated.
>>
>> Thanks!
>> Tim

Re: PerformanceEvaluation results

Posted by Michael Segel <mi...@hotmail.com>.

No.
What tuning did you do?
Why such a small cluster?

Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.



Sent from my iPhone

On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <ti...@gmail.com> wrote:

> Hi all,
> 
> We have a 3 node cluster (CD3u2) with the following hardware:
> 
> RegionServers (+DN + TT)
>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>  Disks: 6x250G SATA 5.4K
>  Memory: 24GB
> 
> Master (+ZK, JT, NN)
>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>  Disks: 2x500G SATA 7.2K
>  Memory: 8GB
> 
> Memory wise, we have:
> Master:
>  NN: 1GB
>  JT: 1GB
>  HBase master: 6GB
>  ZK: 1GB
> RegionServers:
>  RegionServer: 6GB
>  TaskTracker: 1GB
>  11 Mappers @ 1GB each
>  7 Reducers @ 1GB each
> 
> HDFS was empty, and I ran randomWrite and scan both with number
> clients of 50 (seemed to spawn 500 Mappers though...)
> 
> randomWrite:
> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
> 
> scan:
> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
> 
> Would I be correct in thinking that this is way below what is to be
> expected of this hardware?
> We're setting up ganglia now to start debugging, but any suggestions
> on how to diagnose this would be greatly appreciated.
> 
> Thanks!
> Tim

Re: PerformanceEvaluation results

Posted by Stack <st...@duboce.net>.

On Wed, Feb 1, 2012 at 4:51 AM, Tim Robertson <ti...@gmail.com> wrote:
> We're setting up ganglia now to start debugging, but any suggestions
> on how to diagnose this would be greatly appreciated.
>

Get Ganglia set up Tim and then lets chat.  You've checked out the
perf section in the reference manual?  What numbers you need?

St.Ack

Re: PerformanceEvaluation results

Posted by Doug Meil <do...@explorysmedical.com>.

Hi there-

These perf-tests on small clusters are fairly common questions on the
dist-list, but it needs to be stressed that Hbase (and HDFS) doesn't begin
to stretch it's legs until about 5 nodes.

http://hbase.apache.org/book.html#arch.overview






On 2/1/12 7:51 AM, "Tim Robertson" <ti...@gmail.com> wrote:

>Hi all,
>
>We have a 3 node cluster (CD3u2) with the following hardware:
>
>RegionServers (+DN + TT)
>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>  Disks: 6x250G SATA 5.4K
>  Memory: 24GB
>
>Master (+ZK, JT, NN)
>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>  Disks: 2x500G SATA 7.2K
>  Memory: 8GB
>
>Memory wise, we have:
>Master:
>  NN: 1GB
>  JT: 1GB
>  HBase master: 6GB
>  ZK: 1GB
>RegionServers:
>  RegionServer: 6GB
>  TaskTracker: 1GB
>  11 Mappers @ 1GB each
>  7 Reducers @ 1GB each
>
>HDFS was empty, and I ran randomWrite and scan both with number
>clients of 50 (seemed to spawn 500 Mappers though...)
>
>randomWrite:
>12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>
>scan:
>12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>
>Would I be correct in thinking that this is way below what is to be
>expected of this hardware?
>We're setting up ganglia now to start debugging, but any suggestions
>on how to diagnose this would be greatly appreciated.
>
>Thanks!
>Tim
>