You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Skchaudhary <sc...@ivp.in> on 2012/04/24 21:41:29 UTC

Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a
table which has 27 Regions equally distributed among 3 Region servers--9
regions per region server.

Region server 1 has ---region 1-9 Region server 2 has ---region 10-18 Region
server 3 has ---region 19-27

Now when I start a program which inserts rows in region 1 and region 5 (both
under Region Server-1) alternatively and on continuous basis, I see that the
insert time for each row is not constant or consistent---there is a lot of
variance or say standard deviation of insert time is quite large. Some times
it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
sometimes even > 3000 ms.Even though data size in rows is equal.

I understand that due to flushing and compaction of Regions the writes are
blocked---but then it should not be blocked for larger span of time and the
blockage time should be consistent for every flush/compaction (minor
compaction).

All in all every time flush and compaction occurs it should take nearly same
time for each compaction and flush.

For our application we need a consistent quality of service and if not
perfect atleast we need a well visible boundary lines--like for each row
insert it will take some 0 to 10 ms and not more than 10 ms(just an example)
that even though minor compaction or flush occurs.

Is there any setting/configuration which I should try?

Any ideas of how to achieve it in Hbase.

Any help would be really appreciated.

Thanks in advance!!

-- 
View this message in context: http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

Posted by Michel Segel <mi...@hotmail.com>.
I guess Sesame Street isn't global... ;-) oh and of course I f'd the joke by saying Grover and not Oscar so it's my bad. :-(. [Google Oscar the groutch, and you'll understand the joke that I botched]

Its most likely GC and a mis tuned cluster.
The OP doesn't really get in to detail, except to say that his cluster is tiny. Yes, size does matter, regardless of those rumors to the contrary... 3 DN kinda small.  If he's splitting that often then his region size is too small, hot spotting and other things can impact performance however not in the way he described.

Also when you look at performance, look at reads, not writes. You can cache both and writes are less important than reads. (think about it.)

Since this type conversation keeps popping up, it would be a good topic for Strata in NY. (Not too subtle of a hint to those who are picking topics...) Good cluster design is important, more important than people think. 


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 25, 2012, at 12:08 AM, Mikael Sitruk <mi...@gmail.com> wrote:

> 1. writes are not blocked during compaction
> 2. compaction cannot have a constant time since the files/regions are
> getting bigger
> 3. beside the GC pauses (which seems to be the best candidate here) on
> either the client or RS (what are your setting BTW, and data size per
> insert), did you presplit your regions or a split is occurring during the
> execution?
> 4. did you look at the logs? is there any operation that is taking too long
> there (in 0.92 you can configure and print any operation that will take
> long time)
> 
> 
> Regards
> Mikael.S
> 
> On Wed, Apr 25, 2012 at 4:58 AM, Michael Segel <mi...@hotmail.com>wrote:
> 
>> Have you thought about Garbage Collection?
>> 
>> -Grover
>> 
>> Sent from my iPhone
>> 
>> On Apr 24, 2012, at 12:41 PM, "Skchaudhary" <sc...@ivp.in> wrote:
>> 
>>> 
>>> I have a cluster Hbase set-up. In that I have 3 Region Servers. There is
>> a
>>> table which has 27 Regions equally distributed among 3 Region servers--9
>>> regions per region server.
>>> 
>>> Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
>> Region
>>> server 3 has ---region 19-27
>>> 
>>> Now when I start a program which inserts rows in region 1 and region 5
>> (both
>>> under Region Server-1) alternatively and on continuous basis, I see that
>> the
>>> insert time for each row is not constant or consistent---there is a lot
>> of
>>> variance or say standard deviation of insert time is quite large. Some
>> times
>>> it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
>>> sometimes even > 3000 ms.Even though data size in rows is equal.
>>> 
>>> I understand that due to flushing and compaction of Regions the writes
>> are
>>> blocked---but then it should not be blocked for larger span of time and
>> the
>>> blockage time should be consistent for every flush/compaction (minor
>>> compaction).
>>> 
>>> All in all every time flush and compaction occurs it should take nearly
>> same
>>> time for each compaction and flush.
>>> 
>>> For our application we need a consistent quality of service and if not
>>> perfect atleast we need a well visible boundary lines--like for each row
>>> insert it will take some 0 to 10 ms and not more than 10 ms(just an
>> example)
>>> that even though minor compaction or flush occurs.
>>> 
>>> Is there any setting/configuration which I should try?
>>> 
>>> Any ideas of how to achieve it in Hbase.
>>> 
>>> Any help would be really appreciated.
>>> 
>>> Thanks in advance!!
>>> 
>>> --
>>> View this message in context:
>> http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>> 
>> 

Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

Posted by Mikael Sitruk <mi...@gmail.com>.
1. writes are not blocked during compaction
2. compaction cannot have a constant time since the files/regions are
getting bigger
3. beside the GC pauses (which seems to be the best candidate here) on
either the client or RS (what are your setting BTW, and data size per
insert), did you presplit your regions or a split is occurring during the
execution?
4. did you look at the logs? is there any operation that is taking too long
there (in 0.92 you can configure and print any operation that will take
long time)


Regards
Mikael.S

On Wed, Apr 25, 2012 at 4:58 AM, Michael Segel <mi...@hotmail.com>wrote:

> Have you thought about Garbage Collection?
>
> -Grover
>
> Sent from my iPhone
>
> On Apr 24, 2012, at 12:41 PM, "Skchaudhary" <sc...@ivp.in> wrote:
>
> >
> > I have a cluster Hbase set-up. In that I have 3 Region Servers. There is
> a
> > table which has 27 Regions equally distributed among 3 Region servers--9
> > regions per region server.
> >
> > Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
> Region
> > server 3 has ---region 19-27
> >
> > Now when I start a program which inserts rows in region 1 and region 5
> (both
> > under Region Server-1) alternatively and on continuous basis, I see that
> the
> > insert time for each row is not constant or consistent---there is a lot
> of
> > variance or say standard deviation of insert time is quite large. Some
> times
> > it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
> > sometimes even > 3000 ms.Even though data size in rows is equal.
> >
> > I understand that due to flushing and compaction of Regions the writes
> are
> > blocked---but then it should not be blocked for larger span of time and
> the
> > blockage time should be consistent for every flush/compaction (minor
> > compaction).
> >
> > All in all every time flush and compaction occurs it should take nearly
> same
> > time for each compaction and flush.
> >
> > For our application we need a consistent quality of service and if not
> > perfect atleast we need a well visible boundary lines--like for each row
> > insert it will take some 0 to 10 ms and not more than 10 ms(just an
> example)
> > that even though minor compaction or flush occurs.
> >
> > Is there any setting/configuration which I should try?
> >
> > Any ideas of how to achieve it in Hbase.
> >
> > Any help would be really appreciated.
> >
> > Thanks in advance!!
> >
> > --
> > View this message in context:
> http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

In addition to what was said about GC, you might want to double-check
this...

http://hbase.apache.org/book.html#performance

... as well as this case-study for performance troubleshooting

http://hbase.apache.org/book.html#casestudies.perftroub




On 4/24/12 9:58 PM, "Michael Segel" <mi...@hotmail.com> wrote:

>Have you thought about Garbage Collection?
>
>-Grover
>
>Sent from my iPhone
>
>On Apr 24, 2012, at 12:41 PM, "Skchaudhary" <sc...@ivp.in> wrote:
>
>> 
>> I have a cluster Hbase set-up. In that I have 3 Region Servers. There
>>is a
>> table which has 27 Regions equally distributed among 3 Region servers--9
>> regions per region server.
>> 
>> Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
>>Region
>> server 3 has ---region 19-27
>> 
>> Now when I start a program which inserts rows in region 1 and region 5
>>(both
>> under Region Server-1) alternatively and on continuous basis, I see
>>that the
>> insert time for each row is not constant or consistent---there is a lot
>>of
>> variance or say standard deviation of insert time is quite large. Some
>>times
>> it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
>> sometimes even > 3000 ms.Even though data size in rows is equal.
>> 
>> I understand that due to flushing and compaction of Regions the writes
>>are
>> blocked---but then it should not be blocked for larger span of time and
>>the
>> blockage time should be consistent for every flush/compaction (minor
>> compaction).
>> 
>> All in all every time flush and compaction occurs it should take nearly
>>same
>> time for each compaction and flush.
>> 
>> For our application we need a consistent quality of service and if not
>> perfect atleast we need a well visible boundary lines--like for each row
>> insert it will take some 0 to 10 ms and not more than 10 ms(just an
>>example)
>> that even though minor compaction or flush occurs.
>> 
>> Is there any setting/configuration which I should try?
>> 
>> Any ideas of how to achieve it in Hbase.
>> 
>> Any help would be really appreciated.
>> 
>> Thanks in advance!!
>> 
>> -- 
>> View this message in context:
>>http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviati
>>on-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p3
>>3740438.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>> 
>



Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

Posted by Michael Segel <mi...@hotmail.com>.
Have you thought about Garbage Collection?

-Grover

Sent from my iPhone

On Apr 24, 2012, at 12:41 PM, "Skchaudhary" <sc...@ivp.in> wrote:

> 
> I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a
> table which has 27 Regions equally distributed among 3 Region servers--9
> regions per region server.
> 
> Region server 1 has ---region 1-9 Region server 2 has ---region 10-18 Region
> server 3 has ---region 19-27
> 
> Now when I start a program which inserts rows in region 1 and region 5 (both
> under Region Server-1) alternatively and on continuous basis, I see that the
> insert time for each row is not constant or consistent---there is a lot of
> variance or say standard deviation of insert time is quite large. Some times
> it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
> sometimes even > 3000 ms.Even though data size in rows is equal.
> 
> I understand that due to flushing and compaction of Regions the writes are
> blocked---but then it should not be blocked for larger span of time and the
> blockage time should be consistent for every flush/compaction (minor
> compaction).
> 
> All in all every time flush and compaction occurs it should take nearly same
> time for each compaction and flush.
> 
> For our application we need a consistent quality of service and if not
> perfect atleast we need a well visible boundary lines--like for each row
> insert it will take some 0 to 10 ms and not more than 10 ms(just an example)
> that even though minor compaction or flush occurs.
> 
> Is there any setting/configuration which I should try?
> 
> Any ideas of how to achieve it in Hbase.
> 
> Any help would be really appreciated.
> 
> Thanks in advance!!
> 
> -- 
> View this message in context: http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html
> Sent from the HBase User mailing list archive at Nabble.com.
>