You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alexander Ilyin <al...@weborama.com> on 2017/05/05 12:47:26 UTC

Compaction monitoring

Hi,

Tuning HBase performance I've found a lot of settings which affect
compaction process (off-peak hours, time between compactions, compaction
ratio, region sizes, etc.). They all seem to be useful and there are
recommendations in the doc saying which values to set. But I found no way
to assess how they actually affect my cluster performance, i.e. how much
resources is taken by compaction and when. I would like to figure out which
settings work best for my dataset and my specific workload but with only
general recommendations in hand it seems difficult to do.

For example, I have difficulties answering the following questions:
* can I shorten my off-peak hours range?
* can I afford to do compactions more often? or more aggressively?
* how much degrades my performance if region size is becoming too large?

HBase version I'm using is 1.1.2


Alexander

Re: Compaction monitoring

Posted by Vladimir Rodionov <vl...@gmail.com>.
The major issue with HBase compactions not an excessive CPU or IO usage but
excessive temporary (garbage) objects creation,
which results in a more frequent GC failures and in a some cases - RS shut
downs due to long GC pauses.

That is why so important to  keep compactions under control:

disable automatic major compactions and spits, perform those manually
during off peak hours, for example.

-Vlad

On Fri, May 5, 2017 at 7:56 AM, Alexander Ilyin <al...@weborama.com>
wrote:

> Kevin,
>
> Thanks for your answer. We're using Ambari to manage our cluster.
>
> I see an increase of CPU usage and IO but it's not a big one. And this
> increase tends to be at the beginning of off-peak window although it's
> difficult to tell for sure since our workload comes in bursts and the
> picture is not clear. That's why I was asking if there are some metrics
> related specifically to compaction. But probably I can shorten the window.
>
> As for region sizes, I will experiment, as you suggest.
>
>
> On Fri, May 5, 2017 at 4:07 PM, Kevin O'Dell <ke...@rocana.com> wrote:
>
> > Alexander,
> >
> >   That is a great series of questions. What are you using for
> > instrumentation of your HBase cluster?  Cloudera Manager, Ambari,
> Ganglia,
> > Cacti, etc?  You are really asking a lot of performance based metric
> > questions. I don't think you will be able to answer your questions
> without
> > first being able to answer these questions:
> >
> > Do you see the Major Compaction I/O/CPU/Memory spikes throughout the
> whole
> > "off-peak" window?
> >
> > Do you have the host resources overhead to add additional compaction
> > threads to shorten it if so?
> >
> > What do your responses times look like during your "off-peak hours" are
> you
> > still within your SLAs?
> >
> > Answering these questions should quickly allow you to answer your first
> two
> > questions. Your last question is very interesting:
> >
> > *how much degrades my performance if region size is becoming too large?
> <--
> > This is 100% depends, it depends on your environment, I/O usage, SLAs
> etc,
> > I am not sure if anyone has done documented compaction times based on
> > Region sizes.  You may have to do some trial and error here.
> >
> > I hope this helps!
> >
> >
> >
> > On Fri, May 5, 2017 at 8:47 AM, Alexander Ilyin <al...@weborama.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Tuning HBase performance I've found a lot of settings which affect
> > > compaction process (off-peak hours, time between compactions,
> compaction
> > > ratio, region sizes, etc.). They all seem to be useful and there are
> > > recommendations in the doc saying which values to set. But I found no
> way
> > > to assess how they actually affect my cluster performance, i.e. how
> much
> > > resources is taken by compaction and when. I would like to figure out
> > which
> > > settings work best for my dataset and my specific workload but with
> only
> > > general recommendations in hand it seems difficult to do.
> > >
> > > For example, I have difficulties answering the following questions:
> > > * can I shorten my off-peak hours range?
> > > * can I afford to do compactions more often? or more aggressively?
> > > * how much degrades my performance if region size is becoming too
> large?
> > >
> > > HBase version I'm using is 1.1.2
> > >
> > >
> > > Alexander
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Field Engineer
> > 850-496-1298 | Kevin@rocana.com
> > @kevinrodell
> > <http://www.rocana.com>
> >
>

Re: Compaction monitoring

Posted by Alexander Ilyin <al...@weborama.com>.
Kevin,

Thanks for your answer. We're using Ambari to manage our cluster.

I see an increase of CPU usage and IO but it's not a big one. And this
increase tends to be at the beginning of off-peak window although it's
difficult to tell for sure since our workload comes in bursts and the
picture is not clear. That's why I was asking if there are some metrics
related specifically to compaction. But probably I can shorten the window.

As for region sizes, I will experiment, as you suggest.


On Fri, May 5, 2017 at 4:07 PM, Kevin O'Dell <ke...@rocana.com> wrote:

> Alexander,
>
>   That is a great series of questions. What are you using for
> instrumentation of your HBase cluster?  Cloudera Manager, Ambari, Ganglia,
> Cacti, etc?  You are really asking a lot of performance based metric
> questions. I don't think you will be able to answer your questions without
> first being able to answer these questions:
>
> Do you see the Major Compaction I/O/CPU/Memory spikes throughout the whole
> "off-peak" window?
>
> Do you have the host resources overhead to add additional compaction
> threads to shorten it if so?
>
> What do your responses times look like during your "off-peak hours" are you
> still within your SLAs?
>
> Answering these questions should quickly allow you to answer your first two
> questions. Your last question is very interesting:
>
> *how much degrades my performance if region size is becoming too large? <--
> This is 100% depends, it depends on your environment, I/O usage, SLAs etc,
> I am not sure if anyone has done documented compaction times based on
> Region sizes.  You may have to do some trial and error here.
>
> I hope this helps!
>
>
>
> On Fri, May 5, 2017 at 8:47 AM, Alexander Ilyin <al...@weborama.com>
> wrote:
>
> > Hi,
> >
> > Tuning HBase performance I've found a lot of settings which affect
> > compaction process (off-peak hours, time between compactions, compaction
> > ratio, region sizes, etc.). They all seem to be useful and there are
> > recommendations in the doc saying which values to set. But I found no way
> > to assess how they actually affect my cluster performance, i.e. how much
> > resources is taken by compaction and when. I would like to figure out
> which
> > settings work best for my dataset and my specific workload but with only
> > general recommendations in hand it seems difficult to do.
> >
> > For example, I have difficulties answering the following questions:
> > * can I shorten my off-peak hours range?
> > * can I afford to do compactions more often? or more aggressively?
> > * how much degrades my performance if region size is becoming too large?
> >
> > HBase version I'm using is 1.1.2
> >
> >
> > Alexander
> >
>
>
>
> --
> Kevin O'Dell
> Field Engineer
> 850-496-1298 | Kevin@rocana.com
> @kevinrodell
> <http://www.rocana.com>
>

Re: Compaction monitoring

Posted by Kevin O'Dell <ke...@rocana.com>.
Alexander,

  That is a great series of questions. What are you using for
instrumentation of your HBase cluster?  Cloudera Manager, Ambari, Ganglia,
Cacti, etc?  You are really asking a lot of performance based metric
questions. I don't think you will be able to answer your questions without
first being able to answer these questions:

Do you see the Major Compaction I/O/CPU/Memory spikes throughout the whole
"off-peak" window?

Do you have the host resources overhead to add additional compaction
threads to shorten it if so?

What do your responses times look like during your "off-peak hours" are you
still within your SLAs?

Answering these questions should quickly allow you to answer your first two
questions. Your last question is very interesting:

*how much degrades my performance if region size is becoming too large? <--
This is 100% depends, it depends on your environment, I/O usage, SLAs etc,
I am not sure if anyone has done documented compaction times based on
Region sizes.  You may have to do some trial and error here.

I hope this helps!



On Fri, May 5, 2017 at 8:47 AM, Alexander Ilyin <al...@weborama.com>
wrote:

> Hi,
>
> Tuning HBase performance I've found a lot of settings which affect
> compaction process (off-peak hours, time between compactions, compaction
> ratio, region sizes, etc.). They all seem to be useful and there are
> recommendations in the doc saying which values to set. But I found no way
> to assess how they actually affect my cluster performance, i.e. how much
> resources is taken by compaction and when. I would like to figure out which
> settings work best for my dataset and my specific workload but with only
> general recommendations in hand it seems difficult to do.
>
> For example, I have difficulties answering the following questions:
> * can I shorten my off-peak hours range?
> * can I afford to do compactions more often? or more aggressively?
> * how much degrades my performance if region size is becoming too large?
>
> HBase version I'm using is 1.1.2
>
>
> Alexander
>



-- 
Kevin O'Dell
Field Engineer
850-496-1298 | Kevin@rocana.com
@kevinrodell
<http://www.rocana.com>