You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by surfer <su...@crs4.it> on 2012/10/25 07:30:39 UTC

repetita iuvant?

Hi
I tried to run twice the same scan on my table data. I expected time to
improve but that was not the case.
What am I doing wrong? I set "scan.setCacheBlocks(true);" before the
first scanning job to put if not all at least some block in memory.

thank you
surfer

Re: repetita iuvant?

Posted by surfer <su...@crs4.it>.

On 10/25/2012 07:44 AM, Anoop Sam John wrote:
> Hi
> Can you tell more details? How much data your scan is going to retrieve?
it's a full scan of 1.7TB of data on 62 regionserver+master and ZK
quorum machines. I hoped that in some way block caching may slightly
improve the read perfomances. hbase version 0.92.1. scan with hadoop
1.0.3 throught tableinputformat.


>   What is the time taken in each attempt ?
about 1h20'
 
> Can you observe the cache hit ratio?
0%
while the blockCacheSizeMB=1649.8

>  What is the memory avail in RS?
maxHeapMB=8179
( in hbase-env.sh: export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g
-Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70" )


> .....Also the cluster details and regions
>
>
1525 regions
regions too big? I created a pre-splitted table before bulk importing. I
don't understand why the regions didn't increase afterwards.
hbase.hregion.max.filesize is the default 256MB and the regions are
roughly 1GB. How come hbase have not split'em ? but that's another
question....

RE: repetita iuvant?

Posted by Anoop Sam John <an...@huawei.com>.

Hi
Can you tell more details? How much data your scan is going to retrieve?  What is the time taken in each attempt ?
Can you observe the cache hit ratio? What is the memory avail in RS?.....Also the cluster details and regions

-Anoop-
________________________________________
From: surfer [surfer@crs4.it]
Sent: Thursday, October 25, 2012 11:00 AM
To: user@hbase.apache.org
Subject: repetita iuvant?

Hi
I tried to run twice the same scan on my table data. I expected time to
improve but that was not the case.
What am I doing wrong? I set "scan.setCacheBlocks(true);" before the
first scanning job to put if not all at least some block in memory.

thank you
surfer