You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Amandeep Khurana <am...@gmail.com> on 2009/03/28 06:07:23 UTC

Typical hardware configurations

What are the typical hardware config for a node that people are using for
Hadoop and HBase? I am setting up a new 10 node cluster which will have
HBase running as well that will be feeding my front end directly. Currently,
I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the
master. This didnt work very well due to the RAM being a little low.

I got some config details from the powered by page on the Hadoop wiki, but
nothing like that for Hbase.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: Typical hardware configurations

Posted by Steve Loughran <st...@apache.org>.

Scott Carey wrote:
> On 3/30/09 4:41 AM, "Steve Loughran" <st...@apache.org> wrote:
> 
>> Ryan Rawson wrote:
>>
>>> You should also be getting 64-bit systems and running a 64 bit distro on it
>>> and a jvm that has -d64 available.
>> For the namenode yes. For the others, you will take a fairly big memory
>> hit (1.5X object size) due to the longer pointers. JRockit has special
>> compressed pointers, so will JDK 7, apparently.
>>
> 
> Sun Java 6 update 14 has ³Ordinary Object Pointer² compression as well.
> -XX:+UseCompressedOops.  I¹ve been testing out the pre-release of that with
> great success.

Nice. Have you tried Hadoop with it yet?

> 
> Jrockit has virtually no 64 bit overhead up to 4GB, Sun Java 6u14 has small
> overhead up to 32GB with the new compression scheme.  IBM¹s VM also has some
> sort of pointer compression but I don¹t have experience with it myself.

I use the JRockit JVM as it is what our customers use and we need to 
test on the same JVM. It is interesting in that recursive calls don't 
ever seem to run out; the way it does stack doesn't have separate memory 
spaces for stack, permanent generation heap space and the like.

That doesn't mean apps are light: a freshly started IDE consumes more 
physical memory than a VMWare image running XP and outlook. But it is 
fairly responsive, which is good for a UI:
2295m 650m  22m S    2 10.9   0:43.80 java
855m 543m 530m S   11  9.1   4:40.40 vmware-vmx

> 
> http://wikis.sun.com/display/HotSpotInternals/CompressedOops
> http://blog.juma.me.uk/tag/compressed-oops/
>  
> With pointer compression, there may be gains to be had with running 64 bit
> JVMs smaller than 4GB on x86 since then the runtime has access to native 64
> bit integer operations and registers (as well as 2x the register count).  It
> will be highly use-case dependent.

that would certainly benefit atomic operations on longs; for floating 
point math it would be less useful as JVMs have long made use of the SSE 
register set for FP work. 64 bit registers would make it easier to move 
stuff in and out of those registers.

I will try and set up a hudson server with this update and see how well 
it behaves.

Re: Typical hardware configurations

Posted by Scott Carey <sc...@richrelevance.com>.

On 3/30/09 4:41 AM, "Steve Loughran" <st...@apache.org> wrote:

> Ryan Rawson wrote:
> 
>> You should also be getting 64-bit systems and running a 64 bit distro on it
>> and a jvm that has -d64 available.
> 
> For the namenode yes. For the others, you will take a fairly big memory
> hit (1.5X object size) due to the longer pointers. JRockit has special
> compressed pointers, so will JDK 7, apparently.
> 

Sun Java 6 update 14 has ³Ordinary Object Pointer² compression as well.
-XX:+UseCompressedOops.  I¹ve been testing out the pre-release of that with
great success.

Jrockit has virtually no 64 bit overhead up to 4GB, Sun Java 6u14 has small
overhead up to 32GB with the new compression scheme.  IBM¹s VM also has some
sort of pointer compression but I don¹t have experience with it myself.

http://wikis.sun.com/display/HotSpotInternals/CompressedOops
http://blog.juma.me.uk/tag/compressed-oops/

With pointer compression, there may be gains to be had with running 64 bit
JVMs smaller than 4GB on x86 since then the runtime has access to native 64
bit integer operations and registers (as well as 2x the register count).  It
will be highly use-case dependent.

Re: Typical hardware configurations

Posted by Steve Loughran <st...@apache.org>.

Ryan Rawson wrote:

> You should also be getting 64-bit systems and running a 64 bit distro on it
> and a jvm that has -d64 available.

For the namenode yes. For the others, you will take a fairly big memory 
hit (1.5X object size) due to the longer pointers. JRockit has special 
compressed pointers, so will JDK 7, apparently.

Re: Typical hardware configurations

Posted by Ryan Rawson <ry...@gmail.com>.

Even though hbase runs on 'commodity' hardware, it's important to remember
that to achieve scale you need to do a bit better than 1 cpu 1 gb ram type
things.

I tend to think in per-core specs, that way you don't have to worry about 2
core vs 4 core vs 8 core - you buy whatever is most economical at the time.

I'd match 1 core with 2-4gb ram.  You'll want to dedicate 4 gb of ram to
hbase, it'll make life easier.

You should also be getting 64-bit systems and running a 64 bit distro on it
and a jvm that has -d64 available.

A word about master... For hbase, the master is (a) important and (b) very
lightweight.  Meaning the master doesn't use much ram.    For hadoop, things
are different, because the HDFS master is relatively light weight, but needs
lots of ram (every file takes up memory space).  On my cluster, the master
is the same node-type as the rest.

I've heard recommendations to buy better hardware for your master - if you
lose a disk, your whole cluster goes down.  I can't say i disagree with that
sentiment.

Good luck!

On Fri, Mar 27, 2009 at 10:43 PM, Yabo-Arber Xu <ar...@gmail.com>wrote:

> Hi Amandeep,
>
> I just did the same investigation not long ago, and I was recommended to
> get
> Amazon EC2 X-Large
> equivalent<
> http://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricing&sa=D&sntz=1&usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvg
> >nodes:
> , 8
> EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB
> memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit
> (ECU) is  equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007
> Xeon processor.
>
> For more details, you may want to refer to Daniel Leffel's experience on
> setting up HBase<
> http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253E&sa=D&sntz=1&usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ
> >
>
> Hope it helps.
>
> Best,
> Arber
>
> On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > What are the typical hardware config for a node that people are using for
> > Hadoop and HBase? I am setting up a new 10 node cluster which will have
> > HBase running as well that will be feeding my front end directly.
> > Currently,
> > I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on
> > the
> > master. This didnt work very well due to the RAM being a little low.
> >
> > I got some config details from the powered by page on the Hadoop wiki,
> but
> > nothing like that for Hbase.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>

Re: Typical hardware configurations

Posted by Ryan Rawson <ry...@gmail.com>.

Even though hbase runs on 'commodity' hardware, it's important to remember
that to achieve scale you need to do a bit better than 1 cpu 1 gb ram type
things.

I tend to think in per-core specs, that way you don't have to worry about 2
core vs 4 core vs 8 core - you buy whatever is most economical at the time.

I'd match 1 core with 2-4gb ram.  You'll want to dedicate 4 gb of ram to
hbase, it'll make life easier.

You should also be getting 64-bit systems and running a 64 bit distro on it
and a jvm that has -d64 available.

A word about master... For hbase, the master is (a) important and (b) very
lightweight.  Meaning the master doesn't use much ram.    For hadoop, things
are different, because the HDFS master is relatively light weight, but needs
lots of ram (every file takes up memory space).  On my cluster, the master
is the same node-type as the rest.

I've heard recommendations to buy better hardware for your master - if you
lose a disk, your whole cluster goes down.  I can't say i disagree with that
sentiment.

Good luck!

On Fri, Mar 27, 2009 at 10:43 PM, Yabo-Arber Xu <ar...@gmail.com>wrote:

> Hi Amandeep,
>
> I just did the same investigation not long ago, and I was recommended to
> get
> Amazon EC2 X-Large
> equivalent<
> http://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricing&sa=D&sntz=1&usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvg
> >nodes:
> , 8
> EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB
> memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit
> (ECU) is  equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007
> Xeon processor.
>
> For more details, you may want to refer to Daniel Leffel's experience on
> setting up HBase<
> http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253E&sa=D&sntz=1&usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ
> >
>
> Hope it helps.
>
> Best,
> Arber
>
> On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > What are the typical hardware config for a node that people are using for
> > Hadoop and HBase? I am setting up a new 10 node cluster which will have
> > HBase running as well that will be feeding my front end directly.
> > Currently,
> > I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on
> > the
> > master. This didnt work very well due to the RAM being a little low.
> >
> > I got some config details from the powered by page on the Hadoop wiki,
> but
> > nothing like that for Hbase.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>

Re: Typical hardware configurations

Posted by Yabo-Arber Xu <ar...@gmail.com>.

Hi Amandeep,

I just did the same investigation not long ago, and I was recommended to get
Amazon EC2 X-Large
equivalent<http://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricing&sa=D&sntz=1&usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvg>nodes:
, 8
EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB
memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit
(ECU) is  equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007
Xeon processor.

For more details, you may want to refer to Daniel Leffel's experience on
setting up HBase<http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253E&sa=D&sntz=1&usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ>

Hope it helps.

Best,
Arber

On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana <am...@gmail.com> wrote:

> What are the typical hardware config for a node that people are using for
> Hadoop and HBase? I am setting up a new 10 node cluster which will have
> HBase running as well that will be feeding my front end directly.
> Currently,
> I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on
> the
> master. This didnt work very well due to the RAM being a little low.
>
> I got some config details from the powered by page on the Hadoop wiki, but
> nothing like that for Hbase.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

Re: Typical hardware configurations

Posted by Yabo-Arber Xu <ar...@gmail.com>.

Hi Amandeep,

I just did the same investigation not long ago, and I was recommended to get
Amazon EC2 X-Large
equivalent<http://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricing&sa=D&sntz=1&usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvg>nodes:
, 8
EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB
memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit
(ECU) is  equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007
Xeon processor.

For more details, you may want to refer to Daniel Leffel's experience on
setting up HBase<http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253E&sa=D&sntz=1&usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ>

Hope it helps.

Best,
Arber

On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana <am...@gmail.com> wrote:

> What are the typical hardware config for a node that people are using for
> Hadoop and HBase? I am setting up a new 10 node cluster which will have
> HBase running as well that will be feeding my front end directly.
> Currently,
> I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on
> the
> master. This didnt work very well due to the RAM being a little low.
>
> I got some config details from the powered by page on the Hadoop wiki, but
> nothing like that for Hbase.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

Re: Typical hardware configurations

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

I run 10 node cluster with 2 cores 2.4Ghz with 4Gb Ram and dual 250GB drives 
per node.
I run on used 32 bit servers so I can only run 2GB hbase but I still have 
memory left for tasktracker and datanode.
more files in hadoop = more memory used on the namenode. hbase master is 
lightly loaded so I run my on the same node as namenode

My personal option is a large memory 64bit machine can not be fully loaded 
with hbase at this time but will give you better performance.
Maybe if you have lots of MR jobs or need the netter response then it would 
be worth it.
I thank there is still some open issues on to many open file handles etc 
that can limit larger server to not be fully used to there capacity.
Thank in terms of google they stick with low (cheap to replace) hard drive 
sizes medium memory and cost/performance cpus but have lots of them.

Billy

"Amandeep Khurana" <am...@gmail.com> wrote in 
message news:35a22e220903272207s30f26310y3ecbec723b83e229@mail.gmail.com...
> What are the typical hardware config for a node that people are using for
> Hadoop and HBase? I am setting up a new 10 node cluster which will have
> HBase running as well that will be feeding my front end directly. 
> Currently,
> I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on 
> the
> master. This didnt work very well due to the RAM being a little low.
>
> I got some config details from the powered by page on the Hadoop wiki, but
> nothing like that for Hbase.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

Re: Typical hardware configurations

Posted by Andrew Purtell <ap...@apache.org>.

Hello Amandeep,

A basic rule of thumb is 1 core and 1 GB RAM per JVM. The Hadoop
and HBase daemons will all need such an allocation. You can
extend this to the mapreduce subsystem when considering how many
mappers and/or reducers can concurrently execute on each node
alongside the rest of what you are running. Or, you can choose to
partition your hardware to support separate HDFS and HBase from
the mapreduce task runners, as some do, which changes the
situation.

Lots of people try to run all-in-one clusters, where all 
functions are more or less co-located on every node. Strictly
speaking, how much heap a TaskTracker map or reduce task child
will require depends on the user application. But, it still
loads the CPU so I still use the 1 CPU/1 GB rule of thumb even
for these. Overload your CPU resources and the JVM scheduler
will starve threads, introducing spurious heartbeat misses, 
timeouts, and recovery behaviors in system daemons that will
unnecessarily degrade performance and operation. One thing I
have considered but not tried is using Linux CPU affinity masks
to put system functions in one partition and all user mapreduce
tasks in the other. Another option as I mentioned is to split
hardware resources among the functions. 

Here is what I have used in the past in a successful all-in-
one deployment. In parentheses next to the Java process' name
is the heap allocation reserved with -Xmx.

  1: NameNode (2000) and DataNode (1000)

  1: HMaster (1000), JobTracker (1000), and DataNode (1000)

  23: DataNode (1000), HRegionServer (2000), TaskTracker (1000),
      and the concurrency limit for mappers and reducers set
      to 4 and 4, respectively.

We picked a midpoint between cheap hardware and big iron. Our per
node specs was dual quad core, 4/8 GB RAM, 6x 1TB disk. 2x1TB
hosted the system volume in RAID-1 configuration. The remaining
4x1TB drives were attached as JBOD and used as DataNode data
volumes. The rationale for using so much disk per node was
maximization of cluster/rack density.

As the size of your HDFS volume increases, you'll need to grow
the heap allocation of your NameNode accordingly. In all my time
running HBase I never needed more than 2GB allocated to it, but
I hear that Facebook runs a NameNode with a 20GB heap. 

A word of warning however: Currently HBase is a very challenging
user of HDFS. In 0.20 there are some changes (HFile) which
lessens somewhat the number of open files and should also lower
the total number of DataNode xceivers necessary to support
operations. However on my 25 node cluster running Hadoop/HBase
0.19, I found it necessary to increase the DataNode xceiver limit
to 4096 (from its default of 512!) to successfully bootstrap a
HBase cluster with > 7000 regions. Therefore it may not be the
per-node spec that is the determining factor for the stability of
your cluster, but rather the number of DataNodes employed to
sufficiently spread the load. 

Hope that helps,

  - Andy


> From: Amandeep Khurana <am...@gmail.com>
> Subject: Typical hardware configurations
> To: core-user@hadoop.apache.org, hbase-user@hadoop.apache.org
> Date: Friday, March 27, 2009, 10:07 PM
>
> What are the typical hardware config for a node that people
> are using for Hadoop and HBase? I am setting up a new 10 node
> cluster which will have HBase running as well that will be
> feeding my front end directly. Currently, I had a 3 node
> cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the
> master. This didnt work very well due to the RAM being a
> little low.
> 
> I got some config details from the powered by page on the
> Hadoop wiki, but nothing like that for Hbase.
> 
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz

Re: Typical hardware configurations

Posted by Andrew Purtell <ap...@apache.org>.

Hello Amandeep,

A basic rule of thumb is 1 core and 1 GB RAM per JVM. The Hadoop
and HBase daemons will all need such an allocation. You can
extend this to the mapreduce subsystem when considering how many
mappers and/or reducers can concurrently execute on each node
alongside the rest of what you are running. Or, you can choose to
partition your hardware to support separate HDFS and HBase from
the mapreduce task runners, as some do, which changes the
situation.

Lots of people try to run all-in-one clusters, where all 
functions are more or less co-located on every node. Strictly
speaking, how much heap a TaskTracker map or reduce task child
will require depends on the user application. But, it still
loads the CPU so I still use the 1 CPU/1 GB rule of thumb even
for these. Overload your CPU resources and the JVM scheduler
will starve threads, introducing spurious heartbeat misses, 
timeouts, and recovery behaviors in system daemons that will
unnecessarily degrade performance and operation. One thing I
have considered but not tried is using Linux CPU affinity masks
to put system functions in one partition and all user mapreduce
tasks in the other. Another option as I mentioned is to split
hardware resources among the functions. 

Here is what I have used in the past in a successful all-in-
one deployment. In parentheses next to the Java process' name
is the heap allocation reserved with -Xmx.

  1: NameNode (2000) and DataNode (1000)

  1: HMaster (1000), JobTracker (1000), and DataNode (1000)

  23: DataNode (1000), HRegionServer (2000), TaskTracker (1000),
      and the concurrency limit for mappers and reducers set
      to 4 and 4, respectively.

We picked a midpoint between cheap hardware and big iron. Our per
node specs was dual quad core, 4/8 GB RAM, 6x 1TB disk. 2x1TB
hosted the system volume in RAID-1 configuration. The remaining
4x1TB drives were attached as JBOD and used as DataNode data
volumes. The rationale for using so much disk per node was
maximization of cluster/rack density.

As the size of your HDFS volume increases, you'll need to grow
the heap allocation of your NameNode accordingly. In all my time
running HBase I never needed more than 2GB allocated to it, but
I hear that Facebook runs a NameNode with a 20GB heap. 

A word of warning however: Currently HBase is a very challenging
user of HDFS. In 0.20 there are some changes (HFile) which
lessens somewhat the number of open files and should also lower
the total number of DataNode xceivers necessary to support
operations. However on my 25 node cluster running Hadoop/HBase
0.19, I found it necessary to increase the DataNode xceiver limit
to 4096 (from its default of 512!) to successfully bootstrap a
HBase cluster with > 7000 regions. Therefore it may not be the
per-node spec that is the determining factor for the stability of
your cluster, but rather the number of DataNodes employed to
sufficiently spread the load. 

Hope that helps,

  - Andy


> From: Amandeep Khurana <am...@gmail.com>
> Subject: Typical hardware configurations
> To: core-user@hadoop.apache.org, hbase-user@hadoop.apache.org
> Date: Friday, March 27, 2009, 10:07 PM
>
> What are the typical hardware config for a node that people
> are using for Hadoop and HBase? I am setting up a new 10 node
> cluster which will have HBase running as well that will be
> feeding my front end directly. Currently, I had a 3 node
> cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the
> master. This didnt work very well due to the RAM being a
> little low.
> 
> I got some config details from the powered by page on the
> Hadoop wiki, but nothing like that for Hbase.
> 
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz