You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by kfarmer <kf...@camstar.com> on 2012/01/20 20:36:13 UTC

HBase map/scan performance

I'm doing a POC on HBase and wanted to see if someone could verify that my
map/scan performance is reasonable.  I have one 170 million row table.  

My cluster setup is 1 master node and 4 slave nodes, all w/ 8GB RM, 1 500GB
SATA disk, 1 quad core hyperthreaded CPU.

I'm running a MapReduce job over the whole table with only a map, no reduce. 
The scan for the map job is set to read in only 2 columns of just a few
bytes each.  I have each of the 4 slave nodes running 4 map tasks
simultaneously, so 16 map tasks at the same time.

This job completes in about 8 minutes.  That's 354K rows/second for the
cluster, 88K rows/second for the node, and 22K rows/second (or 22
rows/millisecond) for each map task.

Is this performance reasonable for this hardware or does it sound like I
need more tuning?  I've tried increasing the simultaneous map tasks, but I
hit both memory and disk I/O bottlenecks.
-- 
View this message in context: http://old.nabble.com/HBase-map-scan-performance-tp33176613p33176613.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase map/scan performance

Posted by Stack <st...@duboce.net>.

On Fri, Jan 20, 2012 at 11:36 AM, kfarmer <kf...@camstar.com> wrote:

> This job completes in about 8 minutes.  That's 354K rows/second for the
> cluster, 88K rows/second for the node, and 22K rows/second (or 22
> rows/millisecond) for each map task.
>
>
Its not too bad?  What you need?




> Is this performance reasonable for this hardware or does it sound like I
> need more tuning?  I've tried increasing the simultaneous map tasks, but I
> hit both memory and disk I/O bottlenecks.


You've seen the perf section in the manual?  Have you exhausted the
suggestions there?

>From afar, I'd guess disk will be your bottleneck since you have only the
one.  You could try threading in your maps to put more load if you thnk you
can squeeze more out of your cluster to perhaps use less memory so that is
not your bounding factor.

St.Ack