You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Damien Hardy <dh...@figarocms.fr> on 2011/11/10 12:11:35 UTC

Row get very slow

Hello there.


When I want to get a row by rowid the answer is very slow (even 15 secs 
some times)
What is wrong with my Htable ?
Here is some examples to illustrate my problem:

hbase(main):030:0> get 'logs', 
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { 
COLUMN => 'body:body', VERSIONS => 1 }
COLUMN                                               CELL
  body:body                                           
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... 
[haproxy logs] ...

1 row(s) in 6.0310 seconds

hbase(main):031:0> scan 'logs', { STARTROW 
=>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', 
LIMIT => 1 }
ROW                                                  COLUMN+CELL
  _f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, 
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... 
[haproxy logs] ...
  rSSqNcToHdA==

1 row(s) in 2.7160 seconds

hbase(main):032:0> get 'logs', 
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA=='
COLUMN                                               CELL
  body:body                                           
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... 
[haproxy logs] ...
1 row(s) in 5.0640 seconds

hbase(main):033:0> describe 'logs'
DESCRIPTION                                                                                                                           
ENABLED
  {NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', 
REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true
   '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 
'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0660 seconds

hbase(main):025:0> get 'logs', 
'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { 
COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] }
COLUMN                                               CELL
  body:body                                           
timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... 
[haproxy logs] ...

1 row(s) in 0.0630 seconds


scan is always fatser than get, I think it's strange.

I get normal answer when I precise the TS.

The table is about 200 regions distributed on 2 nodes (with full stack 
on each : hdfs / hbase master+regionserver / zookeeper)
Region size is 2GB now.

Recently I increase region size from default size (128MB if I remember) 
to 2Go to get fewer number of regions (I had 3500 regions).

I change hbase.hregion.max.filesize to 2147483648, restart my whole 
cluster, create a new table, copy via pig from old table to the new one 
=> fewer regions => I'm happy \o/
But on my older table the get answer was very fast, like the one with TS 
precised on the new table.

Is the size of regions affect so much the Hbase answer fastness ?

get on other table not rebuilt after config change (regions not merged) 
is still fast.

Thank you,

-- 
Damien






Re: Row get very slow

Posted by Damien Hardy <dh...@figarocms.fr>.
Hi,

It speed it up definitly :)

hbase(main):002:0> get 'logs', 
'_f:squid_t:20111114110759_b:squid_s:204-taDiFMcQaPzN13dDOZ99PA=='
COLUMN                                                CELL
  body:body                                            
timestamp=1321265279234, value=Nov 14 11:00:24 haproxy[15470]: ... 
[haproxy syslogs] ...

1 row(s) in 0.0170 seconds

Thank you again for help and explanations.

Regards,

-- 
Damien


Le 14/11/2011 20:24, lars hofhansl a écrit :
> Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.
>
> Generally there are 2 sizes involved:
> 1. HBase Filesize
> 2. HBase Blocksize
>
> #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)
>
> #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.
>
> -- Lars
> ________________________________
>
> From: Damien Hardy<dh...@figarocms.fr>
> To: user@hbase.apache.org
> Sent: Monday, November 14, 2011 12:51 AM
> Subject: Re: Row get very slow
>
> Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
>> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
>> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
>> BLOCKSIZE represents that value.
>>
>> On Nov 10, lars hofhansl wrote:
>>> "BLOCKSIZE =>   '536870912'"
>>>
>>>
>>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
>
> Hello,
>
> Thank you for answer I have just altered my table and launched a major_compact to get it effective.
>
> I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.
>
> The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?
>
> Thank you.
>
> -- Damien


Re: Row get very slow

Posted by Stack <st...@duboce.net>.
On Mon, Nov 14, 2011 at 11:37 AM, Sam Seigal <se...@yahoo.com> wrote:
> If you are not too concerned with random access time, but want more
> efficient scans, is increasing the block size then a good idea ?
>

I'd say leave things as they are unless you have a problem.

For your case, where random read latency is not so important and you
are only scanning, upping the block size should not change your scan
latencies and it will make the hfile indices smaller (if you double
the blocksize to 128k, your indices should be halved -- you can see
index sizes in your regionserver UI).

St.Ack

Re: Row get very slow

Posted by Sam Seigal <se...@yahoo.com>.
If you are not too concerned with random access time, but want more
efficient scans, is increasing the block size then a good idea ?

On Mon, Nov 14, 2011 at 11:24 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.
>
> Generally there are 2 sizes involved:
> 1. HBase Filesize
> 2. HBase Blocksize
>
> #1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)
>
> #2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.
>
> -- Lars
> ________________________________
>
> From: Damien Hardy <dh...@figarocms.fr>
> To: user@hbase.apache.org
> Sent: Monday, November 14, 2011 12:51 AM
> Subject: Re: Row get very slow
>
> Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
>> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
>> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
>> BLOCKSIZE represents that value.
>>
>> On Nov 10, lars hofhansl wrote:
>>> "BLOCKSIZE =>  '536870912'"
>>>
>>>
>>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.
>
>
> Hello,
>
> Thank you for answer I have just altered my table and launched a major_compact to get it effective.
>
> I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.
>
> The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?
>
> Thank you.
>
> -- Damien
>

Re: Row get very slow

Posted by lars hofhansl <lh...@yahoo.com>.
Did it speed up your queries? As you can see from the followup discussions here, there is some general confusion around this.

Generally there are 2 sizes involved:
1. HBase Filesize
2. HBase Blocksize

#1 sets the maximum size of a region before it is split. Default used to be 512mb, it's now 1g (but usually it should be even larger)

#2 is the size of the blocks inside the HFiles. Smaller blocks mean better random access, but larger block indexes. I would only increase that if you have large cells.

-- Lars
________________________________

From: Damien Hardy <dh...@figarocms.fr>
To: user@hbase.apache.org
Sent: Monday, November 14, 2011 12:51 AM
Subject: Re: Row get very slow

Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
> BLOCKSIZE represents that value.
> 
> On Nov 10, lars hofhansl wrote:
>> "BLOCKSIZE =>  '536870912'"
>> 
>> 
>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.


Hello,

Thank you for answer I have just altered my table and launched a major_compact to get it effective.

I thought that increasing FILSIZE of HBases implies somehow changes on the BLOSKSIZE of my tables and to prevent unbalanced paramaters increased it too ... #FAIL.

The question is : in what application BLOCKSIZE should be changed (increased or decreased) ?

Thank you.

-- Damien

Re: Row get very slow

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

re:  "The question is : in what application BLOCKSIZE should be changed
(increased or decreased) ?"


See..  http://hbase.apache.org/book.html#schema.creation

and...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.h
tml







On 11/14/11 3:51 AM, "Damien Hardy" <dh...@figarocms.fr> wrote:

>Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
>> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
>> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
>> BLOCKSIZE represents that value.
>>
>> On Nov 10, lars hofhansl wrote:
>>> "BLOCKSIZE =>  '536870912'"
>>>
>>>
>>> You set your blocksize to 512mb? The default is 64k (65536), try to
>>>set it to something lower.
>
>
>Hello,
>
>Thank you for answer I have just altered my table and launched a
>major_compact to get it effective.
>
>I thought that increasing FILSIZE of HBases implies somehow changes on
>the BLOSKSIZE of my tables and to prevent unbalanced paramaters
>increased it too ... #FAIL.
>
>The question is : in what application BLOCKSIZE should be changed
>(increased or decreased) ?
>
>Thank you.
>
>-- 
>Damien
>
>
>



Re: Row get very slow

Posted by Damien Hardy <dh...@figarocms.fr>.
Le 13/11/2011 16:13, Arvind Jayaprakash a écrit :
> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
> BLOCKSIZE represents that value.
>
> On Nov 10, lars hofhansl wrote:
>> "BLOCKSIZE =>  '536870912'"
>>
>>
>> You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.


Hello,

Thank you for answer I have just altered my table and launched a 
major_compact to get it effective.

I thought that increasing FILSIZE of HBases implies somehow changes on 
the BLOSKSIZE of my tables and to prevent unbalanced paramaters 
increased it too ... #FAIL.

The question is : in what application BLOCKSIZE should be changed 
(increased or decreased) ?

Thank you.

-- 
Damien



Re: Row get very slow

Posted by Arvind Jayaprakash <wo...@anomalizer.net>.
On Nov 13, Stack wrote:
>On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash <wo...@anomalizer.net> wrote:
>> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
>> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
>> BLOCKSIZE represents that value.

>We should fix that.  What would you like to see Arind?

Looks like Santa is ahead of schedule this year ...

(1) I've always found it hard to find all configurable "per-table"
properties listed in documentation. So that would be a good thing to
have.

(2) Also, having all of per table properies being listed on the hbase
master page would create more awareness of atleast the terms if now how
to twiddle aronud with it.


The problem with the specific parameter in question has to do with how
the mind runs crazy. A lot of hbase design related documents/discussions
mention the term "region size". it is very hard to imagine that
MAX_FILESIZE (which is hardly mentioned anywhere) is what really refers
to region size and that BLOCKSIZE which appears so prominently on the
master page (or output of scanning the .META. tabale for the nerdier
folks) is an entiery different beast is easy to miss. 

Once we address #1 & #2, it becomes easier to yell "Didn't you RTFM" at
anyone who gets confused :-)

Re: Row get very slow

Posted by Stack <st...@duboce.net>.
On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash <wo...@anomalizer.net> wrote:
> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
> BLOCKSIZE represents that value.
>

We should fix that.  What would you like to see Arind?
St.Ack

Re: Row get very slow

Posted by Arvind Jayaprakash <wo...@anomalizer.net>.
A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that
MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume
BLOCKSIZE represents that value.

On Nov 10, lars hofhansl wrote:
>"BLOCKSIZE => '536870912'"
>
>
>You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

Re: Row get very slow

Posted by lars hofhansl <lh...@yahoo.com>.
"BLOCKSIZE => '536870912'"


You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

-- Lars
________________________________

From: Damien Hardy <dh...@figarocms.fr>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Thursday, November 10, 2011 3:11 AM
Subject: Row get very slow

Hello there.


When I want to get a row by rowid the answer is very slow (even 15 secs some times)
What is wrong with my Htable ?
Here is some examples to illustrate my problem:

hbase(main):030:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', VERSIONS => 1 }
COLUMN                                               CELL
body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

1 row(s) in 6.0310 seconds

hbase(main):031:0> scan 'logs', { STARTROW =>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', LIMIT => 1 }
ROW                                                  COLUMN+CELL
_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
rSSqNcToHdA==

1 row(s) in 2.7160 seconds

hbase(main):032:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA=='
COLUMN                                               CELL
body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 5.0640 seconds

hbase(main):033:0> describe 'logs'
DESCRIPTION                                                                                                                           ENABLED
{NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => true
  '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0660 seconds

hbase(main):025:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==', { COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] }
COLUMN                                               CELL
body:body                                           timestamp=1320919979701, value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

1 row(s) in 0.0630 seconds


scan is always fatser than get, I think it's strange.

I get normal answer when I precise the TS.

The table is about 200 regions distributed on 2 nodes (with full stack on each : hdfs / hbase master+regionserver / zookeeper)
Region size is 2GB now.

Recently I increase region size from default size (128MB if I remember) to 2Go to get fewer number of regions (I had 3500 regions).

I change hbase.hregion.max.filesize to 2147483648, restart my whole cluster, create a new table, copy via pig from old table to the new one => fewer regions => I'm happy \o/
But on my older table the get answer was very fast, like the one with TS precised on the new table.

Is the size of regions affect so much the Hbase answer fastness ?

get on other table not rebuilt after config change (regions not merged) is still fast.

Thank you,

-- Damien