You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "wangyongqiang0617@163.com" <wa...@163.com> on 2019/06/18 15:33:05 UTC

question on hfile size upper limit

we set size upper limit for hfile, but not region
so region has different actural size, leading to some analysis task has different input size

can we set size limit on region




wangyongqiang0617@163.com

Re: Re: question on hfile size upper limit

Posted by Anoop John <an...@gmail.com>.
Based on what u pasted as the config
"<property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in
two.</description>
  </property>"

I can say the issue is the version of HBase.

Older HBase versions had this behave what u said.  When a file under a
region's CF grow above the max limit, the region will split.    The reason
why the check was like that is we any way try to major compact files under
a CF into one large file.  So the check based on larger file was ok/

This way is changed later and we start checking the sum of all files under
a region:cf.  Am not sure which version introduced this.   This became a
need when we supported feature like Date Tiered Compaction/ Stripe
Compaction.

So for you to have the required behave, try upgrade to a newer version.

Anoop


On Thu, Jun 20, 2019 at 9:55 PM Jean-Marc Spaggiari <je...@spaggiari.org>
wrote:

> Hi,
>
> Just updating what I said (Thanks Anoop for the warning). I took the
> assumption that you have a single CF... The maxfilesize is per CF, not per
> region. If you have a single CF, then it become the same as per region, but
> a region will split whenever one of the CFs reaches the limit.
>
> HBase will not split a single row. So if you have a single row that grows
> bigger than the maxfilesize, the region will keep growing. You need to
> assess this risk when you do your table design and avoid it. It will not
> split even if there is millions of column qualifiers. A region is defines
> by a start row and a stop row. Therefore a single row can belong only to a
> single region.
>
> JMS
>
> Le jeu. 20 juin 2019 à 05:00, Roshan <jl...@gmail.com> a écrit :
>
> > Hi,
> >
> > If the single rowkey in the table exceeds the size of defined
> > hbase.hregion.max.filesize, whether the region will split or not. In this
> > case, what are the performance issues we face in the Cluster?
> >
> > If the rowkey (belongs to single columnfamily) has different Column
> > qualifier also, the Hfile will not split?
> >
> >
> >
> > On Thu, 20 Jun 2019 at 11:38, wangyongqiang0617@163.com <
> > wangyongqiang0617@163.com> wrote:
> >
> > > this conf:
> > >   <property>
> > >     <name>hbase.hregion.max.filesize</name>
> > >     <value>10737418240</value>
> > >     <description>
> > >     Maximum HStoreFile size. If any one of a column families'
> HStoreFiles
> > > has
> > >     grown to exceed this value, the hosting HRegion is split in
> > > two.</description>
> > >   </property>
> > >
> > >
> > >
> > >
> > >
> > > wangyongqiang0617@163.com
> > >
> > > From: Jean-Marc Spaggiari
> > > Date: 2019-06-19 06:52
> > > To: user
> > > Subject: Re: question on hfile size upper limit
> > > Hi,
> > >
> > > Can you please confirm which parameter you are talking about? The
> default
> > > HBase setting is to limit the size per region (10GB by default), and
> not
> > by
> > > HFiles. This can be configured at the HBase lever, or at the table
> level.
> > >
> > > HTH,
> > >
> > > JMS
> > >
> > > Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
> > > wangyongqiang0617@163.com> a écrit :
> > >
> > > > we set size upper limit for hfile, but not region
> > > > so region has different actural size, leading to some analysis task
> has
> > > > different input size
> > > >
> > > > can we set size limit on region
> > > >
> > > >
> > > >
> > > >
> > > > wangyongqiang0617@163.com
> > > >
> > >
> >
>

Re: Re: question on hfile size upper limit

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi,

Just updating what I said (Thanks Anoop for the warning). I took the
assumption that you have a single CF... The maxfilesize is per CF, not per
region. If you have a single CF, then it become the same as per region, but
a region will split whenever one of the CFs reaches the limit.

HBase will not split a single row. So if you have a single row that grows
bigger than the maxfilesize, the region will keep growing. You need to
assess this risk when you do your table design and avoid it. It will not
split even if there is millions of column qualifiers. A region is defines
by a start row and a stop row. Therefore a single row can belong only to a
single region.

JMS

Le jeu. 20 juin 2019 à 05:00, Roshan <jl...@gmail.com> a écrit :

> Hi,
>
> If the single rowkey in the table exceeds the size of defined
> hbase.hregion.max.filesize, whether the region will split or not. In this
> case, what are the performance issues we face in the Cluster?
>
> If the rowkey (belongs to single columnfamily) has different Column
> qualifier also, the Hfile will not split?
>
>
>
> On Thu, 20 Jun 2019 at 11:38, wangyongqiang0617@163.com <
> wangyongqiang0617@163.com> wrote:
>
> > this conf:
> >   <property>
> >     <name>hbase.hregion.max.filesize</name>
> >     <value>10737418240</value>
> >     <description>
> >     Maximum HStoreFile size. If any one of a column families' HStoreFiles
> > has
> >     grown to exceed this value, the hosting HRegion is split in
> > two.</description>
> >   </property>
> >
> >
> >
> >
> >
> > wangyongqiang0617@163.com
> >
> > From: Jean-Marc Spaggiari
> > Date: 2019-06-19 06:52
> > To: user
> > Subject: Re: question on hfile size upper limit
> > Hi,
> >
> > Can you please confirm which parameter you are talking about? The default
> > HBase setting is to limit the size per region (10GB by default), and not
> by
> > HFiles. This can be configured at the HBase lever, or at the table level.
> >
> > HTH,
> >
> > JMS
> >
> > Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
> > wangyongqiang0617@163.com> a écrit :
> >
> > > we set size upper limit for hfile, but not region
> > > so region has different actural size, leading to some analysis task has
> > > different input size
> > >
> > > can we set size limit on region
> > >
> > >
> > >
> > >
> > > wangyongqiang0617@163.com
> > >
> >
>

Re: Re: question on hfile size upper limit

Posted by Roshan <jl...@gmail.com>.
Hi,

If the single rowkey in the table exceeds the size of defined
hbase.hregion.max.filesize, whether the region will split or not. In this
case, what are the performance issues we face in the Cluster?

If the rowkey (belongs to single columnfamily) has different Column
qualifier also, the Hfile will not split?



On Thu, 20 Jun 2019 at 11:38, wangyongqiang0617@163.com <
wangyongqiang0617@163.com> wrote:

> this conf:
>   <property>
>     <name>hbase.hregion.max.filesize</name>
>     <value>10737418240</value>
>     <description>
>     Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>     grown to exceed this value, the hosting HRegion is split in
> two.</description>
>   </property>
>
>
>
>
>
> wangyongqiang0617@163.com
>
> From: Jean-Marc Spaggiari
> Date: 2019-06-19 06:52
> To: user
> Subject: Re: question on hfile size upper limit
> Hi,
>
> Can you please confirm which parameter you are talking about? The default
> HBase setting is to limit the size per region (10GB by default), and not by
> HFiles. This can be configured at the HBase lever, or at the table level.
>
> HTH,
>
> JMS
>
> Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
> wangyongqiang0617@163.com> a écrit :
>
> > we set size upper limit for hfile, but not region
> > so region has different actural size, leading to some analysis task has
> > different input size
> >
> > can we set size limit on region
> >
> >
> >
> >
> > wangyongqiang0617@163.com
> >
>

Re: Re: question on hfile size upper limit

Posted by "wangyongqiang0617@163.com" <wa...@163.com>.
this conf: 
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.</description>
  </property>





wangyongqiang0617@163.com
 
From: Jean-Marc Spaggiari
Date: 2019-06-19 06:52
To: user
Subject: Re: question on hfile size upper limit
Hi,
 
Can you please confirm which parameter you are talking about? The default
HBase setting is to limit the size per region (10GB by default), and not by
HFiles. This can be configured at the HBase lever, or at the table level.
 
HTH,
 
JMS
 
Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
wangyongqiang0617@163.com> a écrit :
 
> we set size upper limit for hfile, but not region
> so region has different actural size, leading to some analysis task has
> different input size
>
> can we set size limit on region
>
>
>
>
> wangyongqiang0617@163.com
>

Re: question on hfile size upper limit

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi,

Can you please confirm which parameter you are talking about? The default
HBase setting is to limit the size per region (10GB by default), and not by
HFiles. This can be configured at the HBase lever, or at the table level.

HTH,

JMS

Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
wangyongqiang0617@163.com> a écrit :

> we set size upper limit for hfile, but not region
> so region has different actural size, leading to some analysis task has
> different input size
>
> can we set size limit on region
>
>
>
>
> wangyongqiang0617@163.com
>