You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by Dmitry Kangin <cp...@narod.ru> on 2011/09/01 15:24:57 UTC

HBase troubleshooting

Hello everyone!
We started using hbase (hadoop) system and faced some performance issues. Actually we are using hbase in pseudo distributed mode on one node.
We used cloudera distribution pack of hadoop on CentOs 6 with default configuration according to https://ccp.cloudera.com/display/CDHDOC/HBase+Installation.

So, we started to test them on random reading.
Test data contains one table. Each row has length about 10 Kb.
Average random reading rate from one Thrift/Java API connection is 30 rows per second, writing --- 250 rows per second.
If we use 4 connections, reading rate increases to 120 rows per second on each connection, or total 480 rows per second.
Also we noticed, that overall node resources (io, cpu) are being used no more than 3%. We have enough memory (8G and 2 of them is free).
Total data size is 400 000 rows (or about 3,19 Gb).


Is there any rational explanation of this issue?

Best regards, Dmitry Kangin.

Fwd: HBase troubleshooting

Posted by Stack <st...@duboce.net>.

This message was sent to issues.  I'm forwarding it to the appropriate list.
St.Ack

---------- Forwarded message ----------
From: Dmitry Kangin <cp...@narod.ru>
Date: 2011/9/1
Subject: HBase troubleshooting
To: issues@hbase.apache.org

Hello everyone!
We started using hbase (hadoop) system and faced some performance
issues. Actually we are using hbase in pseudo distributed mode on one
node.
We used cloudera distribution pack of hadoop on CentOs 6 with default
configuration according to
https://ccp.cloudera.com/display/CDHDOC/HBase+Installation.

So, we started to test them on random reading.
Test data contains one table. Each row has length about 10 Kb.
Average random reading rate from one Thrift/Java API connection is 30
rows per second, writing --- 250 rows per second.
If we use 4 connections, reading rate increases to 120 rows per second
on each connection, or total 480 rows per second.
Also we noticed, that overall node resources (io, cpu) are being used
no more than 3%. We have enough memory (8G and 2 of them is free).
Total data size is 400 000 rows (or about 3,19 Gb).

Is there any rational explanation of this issue?

Best regards, Dmitry Kangin.