You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Daniel Leffel <da...@gmail.com> on 2008/05/08 06:29:40 UTC

Advice for smaller clusters in write-heavy environments

I thought I might share back with the users my experience in getting HBase
running on a small, 4 node cluster. I ran into a lot of trouble in getting
started, some because of bugs and some specific to my use case. My learnings
I think will hopefully be valuable to new users.

First of all, let me compliment the amazing group of folks developing HBase.
Also, I'd like to say that we owe a lot to the amazing strategy Powerset has
taken as a company to propel the development of their product, both
leveraging and contributing to open source - what you guys are doing is
nothing short than amazing!

My basic use case is to persist a large (and growing) sparce dataset and
enable constant incremental re-computation. In order to test performance for
this use case, it was important to load a test initial dataset - roughly 220
million rows and 6 columns (for now, I'll say columns generically - I'll get
it to strategy of column families).

Some of my learnings

   - "Commodity Hardware" is relative. When I first heard the term, I (and
   many others I know) considered this to be on the order of desktop-grade
   machines - the machines I'd purchased were Dual Core 2+ Ghz Dell Desktops
   (purchased on eBay for $350 a piece). Well, you can definitely do certain
   tasks within the framework with these types of machines, but an ideal
   configuration consists of something much stronger - server-grade, quad core,
   8Gigs of RAM, etc. HBase (particularly if you are going to do a lot of
   writes), needs really good Machine IO . If you are going to try to use
   machines with slow drives and controllers, it might be possible if you have
   a ton of datanodes, but not as advisable on smaller clusters.
   - Ideally, you should always insure that there is one processor available
   for the region server daemon and at least 2 processors for tasktracker (or 1
   if you limit the map and reduce tasks to one each), if you are going to run
   heavy map/reduce jobs. The trouble with not doing so is that until 0.2, when
   there will be better load balancing on regionservers, it's always possible
   that a single region server can be called on to shoulder the full load of
   all tasktrackers. If you have a large write operations happening, you could
   otherwise cause splits and/or compaction to take too long (expensive
   operations) and cause your job to crawl to near halt if your lucky, or die
   completely. This means, if you're only using dual core machines, I'd suggest
   that at least during heavy data-writing periods, you consider running either
   regionservers or tasktracker, but probably not both on the same machine.
   - All machines should run the datanode - this helps the regionservers to
   distribute the IO load better.  That way, when an expensive operation like
   compaction starts, it's spread over more machines. Also, Hadoop can localize
   frequently used files, to some degree.
   - Running bin/stop-hbase.sh can sometimes take a long time. Sometimes,
   regionservers are waiting for a lease to expire. There are a few times when
   there are dead processes (especially if you didn't take the earlier
   suggestions) so check the logs (.out), but often you just need to wait
   longer and it's worth it.
   - If you are writing from a MR job, it's most beneficial to find the
   right balance of number of tasks. Too many tasks means too many splits,
   startup and commits. Too few, and your region servers don't get the benefit
   of a break (the time it takes to commit and initialize a new task) - not too
   mention less to repeat after a failure.
   - Use the new release candidate 0.1.2 #1. It has a number of fixes in it
   that help for issues related to small clusters - I don't regard prior
   releases as usable for those of us being cheap about hardware.
   - Don't be afraid to adjust which daemons you run on which machines. For
   example, for my first large (initial) load, I shutdown all but a couple of
   tasktrackers and started up more region servers, whereas in normal
   operation, that ratio will probably be flipped.
   - Watch the number of regions you have on any particular regionserver.
   I'm in the process at the moment of testing how far you can push this, but
   the big concern is OOME - and unless you're running the latest release
   candidate, you're going to have big problems after an OOME.


Hope this is helpful, St^ack, please feel free to point out where I'm wrong.
:-)

Danny


PS. Thanks again to St^ack. He went over and above the call of duty to help
me and it's bought a ton of confidence I now have in this project.

RE: Advice for smaller clusters in write-heavy environments

Posted by Jim Kellerman <ji...@powerset.com>.

Ditto, useful information! Thanks Daniel!

W.R.T. commodity hardware: That really should be commodity servers. The 'gold standard' for Google when they wrote the Bigtable paper was two dual core Opteron processors with 8GB memory which is what we use on our 4 node test cluster.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Bryan Duxbury [mailto:bryan@rapleaf.com]
> Sent: Wednesday, May 07, 2008 9:42 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Advice for smaller clusters in write-heavy environments
>
> This is super useful information! Thanks for taking the time
> to write it up. Maybe this should be transplanted onto the wiki?
>
> -Bryan
>
> On May 7, 2008, at 9:29 PM, Daniel Leffel wrote:
>
> > I thought I might share back with the users my experience
> in getting
> > HBase running on a small, 4 node cluster. I ran into a lot
> of trouble
> > in getting started, some because of bugs and some specific
> to my use
> > case. My learnings I think will hopefully be valuable to new users.
> >
> > First of all, let me compliment the amazing group of folks
> developing
> > HBase.
> > Also, I'd like to say that we owe a lot to the amazing strategy
> > Powerset has taken as a company to propel the development of their
> > product, both leveraging and contributing to open source - what you
> > guys are doing is nothing short than amazing!
> >
> > My basic use case is to persist a large (and growing)
> sparce dataset
> > and enable constant incremental re-computation. In order to test
> > performance for this use case, it was important to load a
> test initial
> > dataset - roughly 220 million rows and 6 columns (for now, I'll say
> > columns generically - I'll get it to strategy of column families).
> >
> > Some of my learnings
> >
> >    - "Commodity Hardware" is relative. When I first heard
> the term, I
> > (and
> >    many others I know) considered this to be on the order of
> > desktop-grade
> >    machines - the machines I'd purchased were Dual Core 2+ Ghz Dell
> > Desktops
> >    (purchased on eBay for $350 a piece). Well, you can
> definitely do
> > certain
> >    tasks within the framework with these types of machines, but an
> > ideal
> >    configuration consists of something much stronger -
> server- grade,
> > quad core,
> >    8Gigs of RAM, etc. HBase (particularly if you are going
> to do a lot
> > of
> >    writes), needs really good Machine IO . If you are going
> to try to
> > use
> >    machines with slow drives and controllers, it might be
> possible if
> > you have
> >    a ton of datanodes, but not as advisable on smaller clusters.
> >    - Ideally, you should always insure that there is one processor
> > available
> >    for the region server daemon and at least 2 processors for
> > tasktracker (or 1
> >    if you limit the map and reduce tasks to one each), if you are
> > going to run
> >    heavy map/reduce jobs. The trouble with not doing so is
> that until
> > 0.2, when
> >    there will be better load balancing on regionservers,
> it's always
> > possible
> >    that a single region server can be called on to shoulder
> the full
> > load of
> >    all tasktrackers. If you have a large write operations
> happening,
> > you could
> >    otherwise cause splits and/or compaction to take too long
> > (expensive
> >    operations) and cause your job to crawl to near halt if
> your lucky,
> > or die
> >    completely. This means, if you're only using dual core machines,
> > I'd suggest
> >    that at least during heavy data-writing periods, you consider
> > running either
> >    regionservers or tasktracker, but probably not both on the same
> > machine.
> >    - All machines should run the datanode - this helps the
> > regionservers to
> >    distribute the IO load better.  That way, when an expensive
> > operation like
> >    compaction starts, it's spread over more machines. Also,
> Hadoop can
> > localize
> >    frequently used files, to some degree.
> >    - Running bin/stop-hbase.sh can sometimes take a long time.
> > Sometimes,
> >    regionservers are waiting for a lease to expire. There are a few
> > times when
> >    there are dead processes (especially if you didn't take
> the earlier
> >    suggestions) so check the logs (.out), but often you
> just need to
> > wait
> >    longer and it's worth it.
> >    - If you are writing from a MR job, it's most beneficial to find
> > the
> >    right balance of number of tasks. Too many tasks means too many
> > splits,
> >    startup and commits. Too few, and your region servers
> don't get the
> > benefit
> >    of a break (the time it takes to commit and initialize a new
> > task) - not too
> >    mention less to repeat after a failure.
> >    - Use the new release candidate 0.1.2 #1. It has a
> number of fixes
> > in it
> >    that help for issues related to small clusters - I don't regard
> > prior
> >    releases as usable for those of us being cheap about hardware.
> >    - Don't be afraid to adjust which daemons you run on which
> > machines. For
> >    example, for my first large (initial) load, I shutdown all but a
> > couple of
> >    tasktrackers and started up more region servers, whereas
> in normal
> >    operation, that ratio will probably be flipped.
> >    - Watch the number of regions you have on any particular
> > regionserver.
> >    I'm in the process at the moment of testing how far you can push
> > this, but
> >    the big concern is OOME - and unless you're running the latest
> > release
> >    candidate, you're going to have big problems after an OOME.
> >
> >
> > Hope this is helpful, St^ack, please feel free to point out
> where I'm
> > wrong.
> > :-)
> >
> > Danny
> >
> >
> > PS. Thanks again to St^ack. He went over and above the call
> of duty to
> > help me and it's bought a ton of confidence I now have in this
> > project.
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release
> Date: 5/7/2008 7:46 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.10/1421 - Release Date: 5/7/2008 5:23 PM

Re: Advice for smaller clusters in write-heavy environments

Posted by Bryan Duxbury <br...@rapleaf.com>.

This is super useful information! Thanks for taking the time to write  
it up. Maybe this should be transplanted onto the wiki?

-Bryan

On May 7, 2008, at 9:29 PM, Daniel Leffel wrote:

> I thought I might share back with the users my experience in  
> getting HBase
> running on a small, 4 node cluster. I ran into a lot of trouble in  
> getting
> started, some because of bugs and some specific to my use case. My  
> learnings
> I think will hopefully be valuable to new users.
>
> First of all, let me compliment the amazing group of folks  
> developing HBase.
> Also, I'd like to say that we owe a lot to the amazing strategy  
> Powerset has
> taken as a company to propel the development of their product, both
> leveraging and contributing to open source - what you guys are  
> doing is
> nothing short than amazing!
>
> My basic use case is to persist a large (and growing) sparce  
> dataset and
> enable constant incremental re-computation. In order to test  
> performance for
> this use case, it was important to load a test initial dataset -  
> roughly 220
> million rows and 6 columns (for now, I'll say columns generically -  
> I'll get
> it to strategy of column families).
>
> Some of my learnings
>
>    - "Commodity Hardware" is relative. When I first heard the term,  
> I (and
>    many others I know) considered this to be on the order of  
> desktop-grade
>    machines - the machines I'd purchased were Dual Core 2+ Ghz Dell  
> Desktops
>    (purchased on eBay for $350 a piece). Well, you can definitely  
> do certain
>    tasks within the framework with these types of machines, but an  
> ideal
>    configuration consists of something much stronger - server- 
> grade, quad core,
>    8Gigs of RAM, etc. HBase (particularly if you are going to do a  
> lot of
>    writes), needs really good Machine IO . If you are going to try  
> to use
>    machines with slow drives and controllers, it might be possible  
> if you have
>    a ton of datanodes, but not as advisable on smaller clusters.
>    - Ideally, you should always insure that there is one processor  
> available
>    for the region server daemon and at least 2 processors for  
> tasktracker (or 1
>    if you limit the map and reduce tasks to one each), if you are  
> going to run
>    heavy map/reduce jobs. The trouble with not doing so is that  
> until 0.2, when
>    there will be better load balancing on regionservers, it's  
> always possible
>    that a single region server can be called on to shoulder the  
> full load of
>    all tasktrackers. If you have a large write operations  
> happening, you could
>    otherwise cause splits and/or compaction to take too long  
> (expensive
>    operations) and cause your job to crawl to near halt if your  
> lucky, or die
>    completely. This means, if you're only using dual core machines,  
> I'd suggest
>    that at least during heavy data-writing periods, you consider  
> running either
>    regionservers or tasktracker, but probably not both on the same  
> machine.
>    - All machines should run the datanode - this helps the  
> regionservers to
>    distribute the IO load better.  That way, when an expensive  
> operation like
>    compaction starts, it's spread over more machines. Also, Hadoop  
> can localize
>    frequently used files, to some degree.
>    - Running bin/stop-hbase.sh can sometimes take a long time.  
> Sometimes,
>    regionservers are waiting for a lease to expire. There are a few  
> times when
>    there are dead processes (especially if you didn't take the earlier
>    suggestions) so check the logs (.out), but often you just need  
> to wait
>    longer and it's worth it.
>    - If you are writing from a MR job, it's most beneficial to find  
> the
>    right balance of number of tasks. Too many tasks means too many  
> splits,
>    startup and commits. Too few, and your region servers don't get  
> the benefit
>    of a break (the time it takes to commit and initialize a new  
> task) - not too
>    mention less to repeat after a failure.
>    - Use the new release candidate 0.1.2 #1. It has a number of  
> fixes in it
>    that help for issues related to small clusters - I don't regard  
> prior
>    releases as usable for those of us being cheap about hardware.
>    - Don't be afraid to adjust which daemons you run on which  
> machines. For
>    example, for my first large (initial) load, I shutdown all but a  
> couple of
>    tasktrackers and started up more region servers, whereas in normal
>    operation, that ratio will probably be flipped.
>    - Watch the number of regions you have on any particular  
> regionserver.
>    I'm in the process at the moment of testing how far you can push  
> this, but
>    the big concern is OOME - and unless you're running the latest  
> release
>    candidate, you're going to have big problems after an OOME.
>
>
> Hope this is helpful, St^ack, please feel free to point out where  
> I'm wrong.
> :-)
>
> Danny
>
>
> PS. Thanks again to St^ack. He went over and above the call of duty  
> to help
> me and it's bought a ton of confidence I now have in this project.

Re: Advice for smaller clusters in write-heavy environments

Posted by stack <st...@duboce.net>.

Thanks for the helpful note Danny.

Here's a few other things to add to your list.

+ Danny had a map that parsed Text input and then was doing the inserts 
into hbase using TableReduce.  He was using TR probably because we 
suggested he use it but thinking on it, this is probably not the best MR 
setup for filling hbase.  A MR job is going to sort and shuffle the map 
outputs.  This intermediate shuffle/sort step is expensive -- and hbase 
'sorts' on insert anyways.  Danny changed his job so hbase inserts were 
done in the map task.  The map made no emissions and his job had no reduce.
+ On the loading TaskTracker to RegionServers imbalance on job start, 
one tactic we could have tried was run a single TT at job start, then 
after split, add the second one (mid-job).
+ Danny tried hbase and ran into problems.   Some of his issues were 
hbase bugs.  Others were matters of network setup and hardware sizing.   
Rather than give up, he stuck with it and together we figured them out.

St.Ack