You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Assarsson, Emil" <Em...@sonyericsson.com> on 2012/02/06 16:15:23 UTC

randomWrite tests gives random results

Hi,

I'm tryng to optimize a hbase cluster (on hdfs) with the test randomWrite. I have 7 nodes: 1 zookeeper/name/hbase-master/jobtracker and 6 region/data/tasktrackers. Each with 1 disk, 16G memory, 2 x 4 cores. I know that I really should have more disks but for the time being I'm trying to do the best with what I have. 

I have configured tasktrackers to run 1 map/1 red on each host. 

The problem I'm seeing is that I get very varying results spanning from 16sec/100000inserts to 240sec/100000inserts. 
Currently I'm using a 10G heapsize on hbase and 3G heapsize on hdfs. 

How do I find out what makes it this random? I think I should be able to get around 22sec/100000inserts.


Best regards

Emil Assarsson
Sony Ericsson Mobile Communications AB

"The information in this email, and attachment(s) thereto, is strictly confidential and may be legally privileged. It is intended solely for the named recipient(s), and access to this e-mail, or any attachment(s) thereto, by anyone else is unauthorized. Violations hereof may result in legal actions. Any attachment(s) to this e-mail has been checked for viruses, but please rely on your own virus-checker and procedures. If you contact us by e-mail, we will store your name and address to facilitate communications in the matter concerned. If you do not consent to us storing your name and address for above stated purpose, please notify the sender promptly. Also, if you are not the intended recipient please inform the sender by replying to this transmission, and delete the e-mail, its attachment(s), and any copies of it without, disclosing it."



Re: randomWrite tests gives random results

Posted by Ben West <bw...@yahoo.com>.
You can try turning on verbose garbage collection logs and see if the slow times correspond to a GC pause. Cloudera has a series of blog posts regarding GC pauses in HBase and how to avoid them: http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/


----- Original Message -----
From: "Assarsson, Emil" <Em...@sonyericsson.com>
To: "'user@hbase.apache.org'" <us...@hbase.apache.org>
Cc: 
Sent: Monday, February 6, 2012 9:15 AM
Subject: randomWrite tests gives random results

Hi,

I'm tryng to optimize a hbase cluster (on hdfs) with the test randomWrite. I have 7 nodes: 1 zookeeper/name/hbase-master/jobtracker and 6 region/data/tasktrackers. Each with 1 disk, 16G memory, 2 x 4 cores. I know that I really should have more disks but for the time being I'm trying to do the best with what I have. 

I have configured tasktrackers to run 1 map/1 red on each host. 

The problem I'm seeing is that I get very varying results spanning from 16sec/100000inserts to 240sec/100000inserts. 
Currently I'm using a 10G heapsize on hbase and 3G heapsize on hdfs. 

How do I find out what makes it this random? I think I should be able to get around 22sec/100000inserts.


Best regards

Emil Assarsson
Sony Ericsson Mobile Communications AB

"The information in this email, and attachment(s) thereto, is strictly confidential and may be legally privileged. It is intended solely for the named recipient(s), and access to this e-mail, or any attachment(s) thereto, by anyone else is unauthorized. Violations hereof may result in legal actions. Any attachment(s) to this e-mail has been checked for viruses, but please rely on your own virus-checker and procedures. If you contact us by e-mail, we will store your name and address to facilitate communications in the matter concerned. If you do not consent to us storing your name and address for above stated purpose, please notify the sender promptly. Also, if you are not the intended recipient please inform the sender by replying to this transmission, and delete the e-mail, its attachment(s), and any copies of it without, disclosing it."

Re: randomWrite tests gives random results

Posted by Jean-Daniel Cryans <jd...@apache.org>.
If you didn't configure anything more than the heap, PE will by
default create a table with 1 region and a low (albeit default)
memstore size. This means it's spending its time waiting on splits and
it's recompacting your data all the time which wastes a lot of iops.

You didn't tell use which version you're using so here's two things to
fix the former:

 0.90: run the import a few times so that the regions can split, then
run a major compaction.
 0.92: use https://issues.apache.org/jira/browse/HBASE-4440, it's
pretty easy to backport.

To fix the latter, set MEMSTORE_SIZE to something better like 256MB
and also once the table is pre-splitted change the MAX_FILESIZE to
>1GB.

J-D

On Mon, Feb 6, 2012 at 7:15 AM, Assarsson, Emil
<Em...@sonyericsson.com> wrote:
> Hi,
>
> I'm tryng to optimize a hbase cluster (on hdfs) with the test randomWrite. I have 7 nodes: 1 zookeeper/name/hbase-master/jobtracker and 6 region/data/tasktrackers. Each with 1 disk, 16G memory, 2 x 4 cores. I know that I really should have more disks but for the time being I'm trying to do the best with what I have.
>
> I have configured tasktrackers to run 1 map/1 red on each host.
>
> The problem I'm seeing is that I get very varying results spanning from 16sec/100000inserts to 240sec/100000inserts.
> Currently I'm using a 10G heapsize on hbase and 3G heapsize on hdfs.
>
> How do I find out what makes it this random? I think I should be able to get around 22sec/100000inserts.
>
>
> Best regards
>
> Emil Assarsson
> Sony Ericsson Mobile Communications AB
>
> "The information in this email, and attachment(s) thereto, is strictly confidential and may be legally privileged. It is intended solely for the named recipient(s), and access to this e-mail, or any attachment(s) thereto, by anyone else is unauthorized. Violations hereof may result in legal actions. Any attachment(s) to this e-mail has been checked for viruses, but please rely on your own virus-checker and procedures. If you contact us by e-mail, we will store your name and address to facilitate communications in the matter concerned. If you do not consent to us storing your name and address for above stated purpose, please notify the sender promptly. Also, if you are not the intended recipient please inform the sender by replying to this transmission, and delete the e-mail, its attachment(s), and any copies of it without, disclosing it."
>
>