You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2013/06/20 20:09:20 UTC

[COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

I was reading an old thriller, "HBASE-3149 Make flush decisions per column
family", and I got to the good bit where our NicolasS argues that per-CF
flush is likely not needed because small files is fine actually as long as
these small files are hoovered up quckly.  He mentioned
the hbase.hstore.compaction.min.size config which we'd set to be equal to
flush size and he argued that our default should be much lower -- 1/16th
smaller -- so we always get rid of the small files first.

The config. was removed here:

Author: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
Committer: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
Parent: 2c0261b4e6571d627fb017338aeaf10089b75dab (HBASE-7060 Region load
balancing by table does not handle the case where a table's region count is
lower than the number of the RS in the cluster (Ted Yu and Tianying))
Child:  7380036d88ed6c6ddfad4f4fc2ef617ab419d610 (HBASE-7055 port
HBASE-6371 tier-based compaction from 0.89-fb to trunk - revert for further
discussion)
Branches: many (31)
Follows:
Precedes:

    HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to trunk
(Sergey)

I was wondering if w/ our new compaction algos if we are making use of
NicolasS's advice (informed by experience) or not?

Thanks,
St.Ack

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
+1 on experimenting with 0.

On Mon, Jun 24, 2013 at 9:32 AM, lars hofhansl <la...@apache.org> wrote:

> I think it depends on how large we expect the initially flushed HFile to
> be (just to state the obvious).
> The current default matches the memstore flushsize, so if we mostly flush
> because of that limit the current default should be good.
>
>
> If we have many column families, where one dominates, we want to decrease
> this to make sure that the smallest files - that are created because we
> need to flush all CFs - first.
> Not sure what a good default would be, or much we could auto configure
> this.
>
>
> On the other hand maybe setting this to a very small amount might be a
> good default after all. The larger files will eventually be collected by
> the ratio based selection, and having this small will immediately pick
> abnormally tiny HFiles for compaction.
>
> A good test might be to set this to 0 (so it's never used for file
> selection) and then see how this effects selection in common workloads.
>
>
> We'll probably not find defaults that are right for every workload.
>
>
> -- Lars
>
> ________________________________
> From: Stack <st...@duboce.net>
> To: HBase Dev List <de...@hbase.apache.org>
> Sent: Monday, June 24, 2013 8:59 AM
> Subject: Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in
> trunk/0.95?
>
>
> On Thu, Jun 20, 2013 at 3:43 PM, Stack <st...@duboce.net> wrote:
>
> > On Thu, Jun 20, 2013 at 2:41 PM, Sergey Shelukhin <
> sergey@hortonworks.com>wrote:
> >
> >> Part of HBASE-7055 patch that we picked includes CompactionConfiguration
> >> class, which uses a prefix for config values.
> >> See ::getMinCompactSize on that class, it's still used in compaction.
> >>
> >>
> > Thanks Sergey.  Found it.
> >
> > Now, should we do Nicolas's suggestion as a default; i.e. any file < 4MB
> > is always added to compaction set (where currently, IIUC, any file <
> > flushsize is  added to the compaction set)?
> >
> >
> Ping on above question.  Any compactor's have an opinion?
> Thanks,
> St.Ack
>

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by lars hofhansl <la...@apache.org>.
I think it depends on how large we expect the initially flushed HFile to be (just to state the obvious).
The current default matches the memstore flushsize, so if we mostly flush because of that limit the current default should be good.


If we have many column families, where one dominates, we want to decrease this to make sure that the smallest files - that are created because we need to flush all CFs - first.
Not sure what a good default would be, or much we could auto configure this.


On the other hand maybe setting this to a very small amount might be a good default after all. The larger files will eventually be collected by the ratio based selection, and having this small will immediately pick abnormally tiny HFiles for compaction.

A good test might be to set this to 0 (so it's never used for file selection) and then see how this effects selection in common workloads.


We'll probably not find defaults that are right for every workload.


-- Lars

________________________________
From: Stack <st...@duboce.net>
To: HBase Dev List <de...@hbase.apache.org> 
Sent: Monday, June 24, 2013 8:59 AM
Subject: Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?


On Thu, Jun 20, 2013 at 3:43 PM, Stack <st...@duboce.net> wrote:

> On Thu, Jun 20, 2013 at 2:41 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:
>
>> Part of HBASE-7055 patch that we picked includes CompactionConfiguration
>> class, which uses a prefix for config values.
>> See ::getMinCompactSize on that class, it's still used in compaction.
>>
>>
> Thanks Sergey.  Found it.
>
> Now, should we do Nicolas's suggestion as a default; i.e. any file < 4MB
> is always added to compaction set (where currently, IIUC, any file <
> flushsize is  added to the compaction set)?
>
>
Ping on above question.  Any compactor's have an opinion?
Thanks,
St.Ack

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
My impression was that the purpose of this setting is to auto-include
flushes into large compactions that already fit in the ratio. Frankly
speaking without running the cluster it's hard to tell what the setting
should be. Low value will make sure it rarely includes any files, right?
Except for effects of periodic flusher and smaller CFs flushing with the
big CF.
Might be a good idea to set it low, something just below HDFS block size to
not have partial-block files for example. Just how low I'm not certain. We
can set to 32-64Mb and see if any complaints materialize ;)

On Mon, Jun 24, 2013 at 8:59 AM, Stack <st...@duboce.net> wrote:

> On Thu, Jun 20, 2013 at 3:43 PM, Stack <st...@duboce.net> wrote:
>
> > On Thu, Jun 20, 2013 at 2:41 PM, Sergey Shelukhin <
> sergey@hortonworks.com>wrote:
> >
> >> Part of HBASE-7055 patch that we picked includes CompactionConfiguration
> >> class, which uses a prefix for config values.
> >> See ::getMinCompactSize on that class, it's still used in compaction.
> >>
> >>
> > Thanks Sergey.  Found it.
> >
> > Now, should we do Nicolas's suggestion as a default; i.e. any file < 4MB
> > is always added to compaction set (where currently, IIUC, any file <
> > flushsize is  added to the compaction set)?
> >
> >
> Ping on above question.  Any compactor's have an opinion?
> Thanks,
> St.Ack
>

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Stack <st...@duboce.net>.
On Thu, Jun 20, 2013 at 3:43 PM, Stack <st...@duboce.net> wrote:

> On Thu, Jun 20, 2013 at 2:41 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:
>
>> Part of HBASE-7055 patch that we picked includes CompactionConfiguration
>> class, which uses a prefix for config values.
>> See ::getMinCompactSize on that class, it's still used in compaction.
>>
>>
> Thanks Sergey.  Found it.
>
> Now, should we do Nicolas's suggestion as a default; i.e. any file < 4MB
> is always added to compaction set (where currently, IIUC, any file <
> flushsize is  added to the compaction set)?
>
>
Ping on above question.  Any compactor's have an opinion?
Thanks,
St.Ack

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Stack <st...@duboce.net>.
On Thu, Jun 20, 2013 at 2:41 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> Part of HBASE-7055 patch that we picked includes CompactionConfiguration
> class, which uses a prefix for config values.
> See ::getMinCompactSize on that class, it's still used in compaction.
>
>
Thanks Sergey.  Found it.

Now, should we do Nicolas's suggestion as a default; i.e. any file < 4MB is
always added to compaction set (where currently, IIUC, any file < flushsize
is  added to the compaction set)?

Thanks,
St.Ack

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
Part of HBASE-7055 patch that we picked includes CompactionConfiguration
class, which uses a prefix for config values.
See ::getMinCompactSize on that class, it's still used in compaction.

On Thu, Jun 20, 2013 at 2:21 PM, Andrew Purtell <ap...@apache.org> wrote:

> The history of this change on trunk appears to be the commit of a removal
> as pointed out by Stack, then a revert of that commit. Later the patch or a
> similar one is applied - and reverted and applied again - as pointed out by
> Ted. A bit confusing and incidental to the discussion of its possible
> usefulness. Can we get back to that?
>
>
>
>
> On Thu, Jun 20, 2013 at 1:41 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > I looked at the diff for the following four commits:
> >
> > r1414308 | stack | 2012-11-28 02:33:28 +0800 (Wed, 28 Nov 2012) | 1 line
> >
> > HBASE-7110 refactor the compaction selection and config code similarly to
> > 0.89-fb changes; REAPPLY v9
> > ------------------------------------------------------------------------
> > r1414000 | stack | 2012-11-27 13:23:14 +0800 (Tue, 27 Nov 2012) | 1 line
> >
> > HBASE-7110 refactor the compaction selection and config code similarly to
> > 0.89-fb changes; REVERT of original patch and ADDENDUM because applied
> old
> > patch originally, v8
> > ------------------------------------------------------------------------
> > r1413995 | stack | 2012-11-27 12:48:38 +0800 (Tue, 27 Nov 2012) | 1 line
> >
> > HBASE-7110 refactor the compaction selection and config code similarly to
> > 0.89-fb changes; ADDENDUM to fix broke TestHeapSize
> > ------------------------------------------------------------------------
> > r1413912 | stack | 2012-11-27 06:51:37 +0800 (Tue, 27 Nov 2012) | 1 line
> >
> > HBASE-7110 refactor the compaction selection and config code similarly to
> > 0.89-fb changes
> > ------------------------------------------------------------------------
> > r1407725 | larsh | 2012-11-10 12:28:07 +0800 (Sat, 10 Nov 2012) | 1 line
> >
> > HBASE-4583 Integrate RWCC with Append and Increment operations
> >
> > Here is what I found:
> >
> > $ svn diff -r 1407725:1414308
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
> > | grep "hbase.hstore.compaction.min"
> > -      conf.getInt("hbase.hstore.compaction.min",
> > -    LOG.info("hbase.hstore.compaction.min = " + this.minFilesToCompact);
> > -    this.minCompactSize =
> conf.getLong("hbase.hstore.compaction.min.size",
> > -   *  "hbase.hstore.compaction.min.size"
> > -   *  "hbase.hstore.compaction.min"
> >
> > Cheers
> >
> > On Thu, Jun 20, 2013 at 11:09 AM, Stack <st...@duboce.net> wrote:
> >
> > > I was reading an old thriller, "HBASE-3149 Make flush decisions per
> > column
> > > family", and I got to the good bit where our NicolasS argues that
> per-CF
> > > flush is likely not needed because small files is fine actually as long
> > as
> > > these small files are hoovered up quckly.  He mentioned
> > > the hbase.hstore.compaction.min.size config which we'd set to be equal
> to
> > > flush size and he argued that our default should be much lower --
> 1/16th
> > > smaller -- so we always get rid of the small files first.
> > >
> > > The config. was removed here:
> > >
> > > Author: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> > > Committer: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> > > Parent: 2c0261b4e6571d627fb017338aeaf10089b75dab (HBASE-7060 Region
> load
> > > balancing by table does not handle the case where a table's region
> count
> > is
> > > lower than the number of the RS in the cluster (Ted Yu and Tianying))
> > > Child:  7380036d88ed6c6ddfad4f4fc2ef617ab419d610 (HBASE-7055 port
> > > HBASE-6371 tier-based compaction from 0.89-fb to trunk - revert for
> > further
> > > discussion)
> > > Branches: many (31)
> > > Follows:
> > > Precedes:
> > >
> > >     HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to
> > trunk
> > > (Sergey)
> > >
> > > I was wondering if w/ our new compaction algos if we are making use of
> > > NicolasS's advice (informed by experience) or not?
> > >
> > > Thanks,
> > > St.Ack
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Andrew Purtell <ap...@apache.org>.
The history of this change on trunk appears to be the commit of a removal
as pointed out by Stack, then a revert of that commit. Later the patch or a
similar one is applied - and reverted and applied again - as pointed out by
Ted. A bit confusing and incidental to the discussion of its possible
usefulness. Can we get back to that?




On Thu, Jun 20, 2013 at 1:41 PM, Ted Yu <yu...@gmail.com> wrote:

> I looked at the diff for the following four commits:
>
> r1414308 | stack | 2012-11-28 02:33:28 +0800 (Wed, 28 Nov 2012) | 1 line
>
> HBASE-7110 refactor the compaction selection and config code similarly to
> 0.89-fb changes; REAPPLY v9
> ------------------------------------------------------------------------
> r1414000 | stack | 2012-11-27 13:23:14 +0800 (Tue, 27 Nov 2012) | 1 line
>
> HBASE-7110 refactor the compaction selection and config code similarly to
> 0.89-fb changes; REVERT of original patch and ADDENDUM because applied old
> patch originally, v8
> ------------------------------------------------------------------------
> r1413995 | stack | 2012-11-27 12:48:38 +0800 (Tue, 27 Nov 2012) | 1 line
>
> HBASE-7110 refactor the compaction selection and config code similarly to
> 0.89-fb changes; ADDENDUM to fix broke TestHeapSize
> ------------------------------------------------------------------------
> r1413912 | stack | 2012-11-27 06:51:37 +0800 (Tue, 27 Nov 2012) | 1 line
>
> HBASE-7110 refactor the compaction selection and config code similarly to
> 0.89-fb changes
> ------------------------------------------------------------------------
> r1407725 | larsh | 2012-11-10 12:28:07 +0800 (Sat, 10 Nov 2012) | 1 line
>
> HBASE-4583 Integrate RWCC with Append and Increment operations
>
> Here is what I found:
>
> $ svn diff -r 1407725:1414308
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
> | grep "hbase.hstore.compaction.min"
> -      conf.getInt("hbase.hstore.compaction.min",
> -    LOG.info("hbase.hstore.compaction.min = " + this.minFilesToCompact);
> -    this.minCompactSize = conf.getLong("hbase.hstore.compaction.min.size",
> -   *  "hbase.hstore.compaction.min.size"
> -   *  "hbase.hstore.compaction.min"
>
> Cheers
>
> On Thu, Jun 20, 2013 at 11:09 AM, Stack <st...@duboce.net> wrote:
>
> > I was reading an old thriller, "HBASE-3149 Make flush decisions per
> column
> > family", and I got to the good bit where our NicolasS argues that per-CF
> > flush is likely not needed because small files is fine actually as long
> as
> > these small files are hoovered up quckly.  He mentioned
> > the hbase.hstore.compaction.min.size config which we'd set to be equal to
> > flush size and he argued that our default should be much lower -- 1/16th
> > smaller -- so we always get rid of the small files first.
> >
> > The config. was removed here:
> >
> > Author: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> > Committer: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> > Parent: 2c0261b4e6571d627fb017338aeaf10089b75dab (HBASE-7060 Region load
> > balancing by table does not handle the case where a table's region count
> is
> > lower than the number of the RS in the cluster (Ted Yu and Tianying))
> > Child:  7380036d88ed6c6ddfad4f4fc2ef617ab419d610 (HBASE-7055 port
> > HBASE-6371 tier-based compaction from 0.89-fb to trunk - revert for
> further
> > discussion)
> > Branches: many (31)
> > Follows:
> > Precedes:
> >
> >     HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to
> trunk
> > (Sergey)
> >
> > I was wondering if w/ our new compaction algos if we are making use of
> > NicolasS's advice (informed by experience) or not?
> >
> > Thanks,
> > St.Ack
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by Ted Yu <yu...@gmail.com>.
I looked at the diff for the following four commits:

r1414308 | stack | 2012-11-28 02:33:28 +0800 (Wed, 28 Nov 2012) | 1 line

HBASE-7110 refactor the compaction selection and config code similarly to
0.89-fb changes; REAPPLY v9
------------------------------------------------------------------------
r1414000 | stack | 2012-11-27 13:23:14 +0800 (Tue, 27 Nov 2012) | 1 line

HBASE-7110 refactor the compaction selection and config code similarly to
0.89-fb changes; REVERT of original patch and ADDENDUM because applied old
patch originally, v8
------------------------------------------------------------------------
r1413995 | stack | 2012-11-27 12:48:38 +0800 (Tue, 27 Nov 2012) | 1 line

HBASE-7110 refactor the compaction selection and config code similarly to
0.89-fb changes; ADDENDUM to fix broke TestHeapSize
------------------------------------------------------------------------
r1413912 | stack | 2012-11-27 06:51:37 +0800 (Tue, 27 Nov 2012) | 1 line

HBASE-7110 refactor the compaction selection and config code similarly to
0.89-fb changes
------------------------------------------------------------------------
r1407725 | larsh | 2012-11-10 12:28:07 +0800 (Sat, 10 Nov 2012) | 1 line

HBASE-4583 Integrate RWCC with Append and Increment operations

Here is what I found:

$ svn diff -r 1407725:1414308
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
| grep "hbase.hstore.compaction.min"
-      conf.getInt("hbase.hstore.compaction.min",
-    LOG.info("hbase.hstore.compaction.min = " + this.minFilesToCompact);
-    this.minCompactSize = conf.getLong("hbase.hstore.compaction.min.size",
-   *  "hbase.hstore.compaction.min.size"
-   *  "hbase.hstore.compaction.min"

Cheers

On Thu, Jun 20, 2013 at 11:09 AM, Stack <st...@duboce.net> wrote:

> I was reading an old thriller, "HBASE-3149 Make flush decisions per column
> family", and I got to the good bit where our NicolasS argues that per-CF
> flush is likely not needed because small files is fine actually as long as
> these small files are hoovered up quckly.  He mentioned
> the hbase.hstore.compaction.min.size config which we'd set to be equal to
> flush size and he argued that our default should be much lower -- 1/16th
> smaller -- so we always get rid of the small files first.
>
> The config. was removed here:
>
> Author: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> Committer: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
> Parent: 2c0261b4e6571d627fb017338aeaf10089b75dab (HBASE-7060 Region load
> balancing by table does not handle the case where a table's region count is
> lower than the number of the RS in the cluster (Ted Yu and Tianying))
> Child:  7380036d88ed6c6ddfad4f4fc2ef617ab419d610 (HBASE-7055 port
> HBASE-6371 tier-based compaction from 0.89-fb to trunk - revert for further
> discussion)
> Branches: many (31)
> Follows:
> Precedes:
>
>     HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to trunk
> (Sergey)
>
> I was wondering if w/ our new compaction algos if we are making use of
> NicolasS's advice (informed by experience) or not?
>
> Thanks,
> St.Ack
>

Re: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?

Posted by lars hofhansl <la...@apache.org>.
Weird, that jira is marked as "later" and the commit message seems to indicate a revert.
Did the revert inadvertently remove that config from 0.95.trunk?

That option is indeed gone from trunk, I was under the impression that striped/tired compression are not ready quite, yet.
Or did Elliotts compaction selection rework make this option unneeded?


-- Lars

________________________________
From: Stack <st...@duboce.net>
To: HBase Dev List <de...@hbase.apache.org> 
Sent: Thursday, June 20, 2013 11:09 AM
Subject: [COMPACTIONS] Anyone seen hbase.hstore.compaction.min.size in trunk/0.95?


I was reading an old thriller, "HBASE-3149 Make flush decisions per column
family", and I got to the good bit where our NicolasS argues that per-CF
flush is likely not needed because small files is fine actually as long as
these small files are hoovered up quckly.  He mentioned
the hbase.hstore.compaction.min.size config which we'd set to be equal to
flush size and he argued that our default should be much lower -- 1/16th
smaller -- so we always get rid of the small files first.

The config. was removed here:

Author: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
Committer: Zhihong Yu <te...@apache.org>  2012-10-30 13:14:01
Parent: 2c0261b4e6571d627fb017338aeaf10089b75dab (HBASE-7060 Region load
balancing by table does not handle the case where a table's region count is
lower than the number of the RS in the cluster (Ted Yu and Tianying))
Child:  7380036d88ed6c6ddfad4f4fc2ef617ab419d610 (HBASE-7055 port
HBASE-6371 tier-based compaction from 0.89-fb to trunk - revert for further
discussion)
Branches: many (31)
Follows:
Precedes:

    HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to trunk
(Sergey)

I was wondering if w/ our new compaction algos if we are making use of
NicolasS's advice (informed by experience) or not?

Thanks,
St.Ack