You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/01/14 06:01:07 UTC

HBase: minimal number of boxes?

Hello,

I was wondering what a minimal setup in terms of # of servers might be for HBase.  Here is what I think is needed:


1 or 2 HBase master servers   -- 1 or 2 dedicated boxes?

1 or more RegionServers        -- 1 or more dedicated boxes?

1 or more Zookeepers            -- 1 or more dedicated boxes?


If running on HDFS, add:
1 or 2 NameNodes                 -- can this run on same box(es) as HBase master?

1 or more DataNodes             -- can DNs be on same box(as) as RegionServers?


If you want to run MR jobs on data in HBase, add:
1 or more JobTrackers           -- can this run on the same box as HBase master and NN?

1 or more TaskTrackers         -- can this run on the same box as RegionServer + DN?

So, my main questions are:

* Is it OK for HBase Master and NameNode (+JobTracker) to run on the same server? NN needs memory.  What does HBase Master need the most?

* Is it OK for RegionServer and DataNode (+TaskTracker) to run on the same server? (I think this is actually advised, so data is local?)  I believe RegionMaster is a memory hungry (b/c of Memcache) process?  I believe DNs need the CPU to run the MR jobs, and disk I/O, of course.

* Finally, is the following correct?


Non-HA system, with local disk:
1 HB master/NN/JT + 1 RegionServer/TT/DN + 1 ZK   =  3 boxes

HA HBase cluster with HDFS:
2 HB masters/NNs/JTs + 2 RegionServers/TTs/DNs + 2 ZKs  =  6 boxes

Thanks,
Otis

Re: HBase: minimal number of boxes?

Posted by Andrew Purtell <ap...@apache.org>.
Hi Otis,

> * Is it OK for HBase Master and NameNode (+JobTracker) to run on
> the same server? NN needs memory.  What does HBase Master need
> the most?

The HBase Master is normally not very busy. It just needs to be
available when region servers check in, and for maintaining timely
Zookeeper heartbeats. As long as there is sufficient RAM on the
combined NameNode+Master (+JobTracker) such that the system never
swaps, this is ok. 

You can consider running multiple HBase masters to remove one SPOF
from the deployment, but the Hadoop side still has issues -- NameNode,
JobTracker. But, yes, for a non-HA deployment it makes sense to load
all of these up on one server. 

> * Is it OK for RegionServer and DataNode (+TaskTracker) to run on
> the same server? (I think this is actually advised, so data is
> local?)

Yes this is advised for that reason. Eventually, through background 
compaction, the data in HDFS which backs the region stores is brought
local. MapReduce jobs run against HBase after this happens get data
locality as each split corresponds to a region and the task will be
scheduled on the corresponding region server. 

> I believe RegionMaster is a memory hungry (b/c of Memcache)
> process?

Yes. The more RAM you can give to the region servers, the better for
performance:

  - Read caching (block cache) to avoid needing to hit the
    filesystem to serve frequently accessed data

  - Write caching (MemStore) to ride over flushes and compactions
    without blocking clients

> 1 or more Zookeepers            -- 1 or more dedicated boxes?

I would advise running a dedicated ZK quorum ensemble, yes. ZK is a 
2N+1 fault tolerant system, so deploy 3 servers if you can stand to
lose only one, or 5 if you want to be able to lose up to 2, etc. IIRC,
there are diminishing returns after 7 or 9. Though this may seem like
a lot of overhead just to run HBase, ZK has a lot of merit on its own
terms for providing synchronization primitives for your service or
application, hosting dynamic config (and use watchers to get notice
of changes), presence and group membership, etc. 

> Non-HA system, with local disk:
> 1 HB master/NN/JT + 1 RegionServer/TT/DN + 1 ZK   =  3 boxes

Too small. It is my experience you need 3 RegionServer/TT/DN for
something minimally useful. Also remember to tune HDFS for such a
small cluster -- set minimum replication to 1 or 2. 

> HA HBase cluster with HDFS:
> 2 HB masters/NNs/JTs + 2 RegionServers/TTs/DNs + 2 ZKs  =  6 boxes

Too small, likewise. 

Hope this helps, 

  - Andy