You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by jing wang <ha...@gmail.com> on 2012/09/05 12:17:46 UTC

reduce influence of auto-splitting region

Hi there,

  Using Hbase as a realtime storage(7*24h), how to reduce the influence of
region auto-splitting?
  Any advice will be appreciated!


Thanks,
Jing

Re: reduce influence of auto-splitting region

Posted by jing wang <ha...@gmail.com>.
Hi JM,

  Thanks for your reply.
  More questions:what does 'the way' you said mean? The rowkey ranges, just
like hbase book said? http://hbase.apache.org/book/perf.writing.html



Thanks,
Jing Wang

2012/9/5 Jean-Marc Spaggiari <je...@spaggiari.org>

> Hi Jing,
>
> If you pre-split your regions a lot, you will reduce the number and
> the influence of the auto-splits. But for that you need to know very
> well the way the data is going to come into your database to make sure
> you split your regions evenly.
>
> JM
>
> 2012/9/5, jing wang <ha...@gmail.com>:
> > Hi there,
> >
> >   Using Hbase as a realtime storage(7*24h), how to reduce the influence
> of
> > region auto-splitting?
> >   Any advice will be appreciated!
> >
> >
> > Thanks,
> > Jing
> >
>

Re: reduce influence of auto-splitting region

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Jing,

If you pre-split your regions a lot, you will reduce the number and
the influence of the auto-splits. But for that you need to know very
well the way the data is going to come into your database to make sure
you split your regions evenly.

JM

2012/9/5, jing wang <ha...@gmail.com>:
> Hi there,
>
>   Using Hbase as a realtime storage(7*24h), how to reduce the influence of
> region auto-splitting?
>   Any advice will be appreciated!
>
>
> Thanks,
> Jing
>

RE: reduce influence of auto-splitting region

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
You can use the property hbase.hregion.max.filesize.  You can set this to a
higher value and control the splits through your application.

Regards
Ram

> -----Original Message-----
> From: jing wang [mailto:happygodwithwang@gmail.com]
> Sent: Wednesday, September 05, 2012 3:48 PM
> To: user@hbase.apache.org
> Subject: reduce influence of auto-splitting region
> 
> Hi there,
> 
>   Using Hbase as a realtime storage(7*24h), how to reduce the influence
> of
> region auto-splitting?
>   Any advice will be appreciated!
> 
> 
> Thanks,
> Jing


RE: reduce influence of auto-splitting region

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Hi JingWang

It is not necessary that region split can cause GC problems.  Based on your
use case we may need to configure heapspace for the RS.
Coming back to region splits, presplit of the tables created is a good
option.  
Assume a case where I know that the data that is going to come into hbase is
on a hourly basis.  Then one option could be presplit your table based on
the hours and assign the regions in roundrobin fashion to every RS. 
This will ensure that any particular hours data will go into one region
specified for that hour only.  So after that hour is over the data will be
moving over to another region server.
But here again every hour can be split equally into the different RS like 5
or 10 regions with in an hour. 
These are some ways, but should be chosen as per the data that your cluster
will be operating upon.

Regards
Ram

> -----Original Message-----
> From: jing wang [mailto:happygodwithwang@gmail.com]
> Sent: Wednesday, September 05, 2012 6:42 PM
> To: user@hbase.apache.org
> Subject: Re: reduce influence of auto-splitting region
> 
> Hi Ram,
> 
> Thanks for your advice. We did consider what you said.
> As Hbase is used as a realtime storage,just like mysql/oracle. When
> splitted, hbase may lead gc to 'stop the world' or some long time full
> gc.
> Our application can't accpet this.
> 
> Thanks,
> Jing Wang
> 
> 2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>
> 
> > You can use the property hbase.hregion.max.filesize.  You can set
> this to a
> > higher value and control the splits through your application.
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: jing wang [mailto:happygodwithwang@gmail.com]
> > > Sent: Wednesday, September 05, 2012 3:48 PM
> > > To: user@hbase.apache.org
> > > Subject: reduce influence of auto-splitting region
> > >
> > > Hi there,
> > >
> > >   Using Hbase as a realtime storage(7*24h), how to reduce the
> influence
> > > of
> > > region auto-splitting?
> > >   Any advice will be appreciated!
> > >
> > >
> > > Thanks,
> > > Jing
> >
> >


RE: reduce influence of auto-splitting region

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Yes.  The row keys generated should be falling in the range of one of the
region's start and end key .  So HBase internally can take care of
distributing to the specified region server.
As mentioned in http://hbase.apache.org/book/perf.writing.html, we also need
to take care of not making one particular region  as hot region.

If suppose the data for a span of 30 mins is collected and then it is passed
on to HBase then the client can be written in such a way like the puts are
equally distributed to the regions that comprises the 30 mins data.

Hope this helps.

Regards
Ram

> -----Original Message-----
> From: jing wang [mailto:happygodwithwang@gmail.com]
> Sent: Wednesday, September 05, 2012 8:00 PM
> To: user@hbase.apache.org
> Subject: Re: reduce influence of auto-splitting region
> 
> Hi Ram,
> 
>   How to drive the data to the specific hourly region? Use the code
> like
> http://hbase.apache.org/book/perf.writing.html?
> 
> 
> Thanks,
> Jing Wang
> 
> 2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>
> 
> > Hi JingWang
> >
> > It is not necessary that region split can cause GC problems.  Based
> on your
> > use case we may need to configure heapspace for the RS.
> > Coming back to region splits, presplit of the tables created is a
> good
> > option.
> > Assume a case where I know that the data that is going to come into
> hbase
> > is
> > on a hourly basis.  Then one option could be presplit your table
> based on
> > the hours and assign the regions in roundrobin fashion to every RS.
> > This will ensure that any particular hours data will go into one
> region
> > specified for that hour only.  So after that hour is over the data
> will be
> > moving over to another region server.
> > But here again every hour can be split equally into the different RS
> like 5
> > or 10 regions with in an hour.
> > These are some ways, but should be chosen as per the data that your
> cluster
> > will be operating upon.
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: jing wang [mailto:happygodwithwang@gmail.com]
> > > Sent: Wednesday, September 05, 2012 6:42 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: reduce influence of auto-splitting region
> > >
> > > Hi Ram,
> > >
> > > Thanks for your advice. We did consider what you said.
> > > As Hbase is used as a realtime storage,just like mysql/oracle. When
> > > splitted, hbase may lead gc to 'stop the world' or some long time
> full
> > > gc.
> > > Our application can't accpet this.
> > >
> > > Thanks,
> > > Jing Wang
> > >
> > > 2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>
> > >
> > > > You can use the property hbase.hregion.max.filesize.  You can set
> > > this to a
> > > > higher value and control the splits through your application.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > > -----Original Message-----
> > > > > From: jing wang [mailto:happygodwithwang@gmail.com]
> > > > > Sent: Wednesday, September 05, 2012 3:48 PM
> > > > > To: user@hbase.apache.org
> > > > > Subject: reduce influence of auto-splitting region
> > > > >
> > > > > Hi there,
> > > > >
> > > > >   Using Hbase as a realtime storage(7*24h), how to reduce the
> > > influence
> > > > > of
> > > > > region auto-splitting?
> > > > >   Any advice will be appreciated!
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Jing
> > > >
> > > >
> >
> >


Re: reduce influence of auto-splitting region

Posted by jing wang <ha...@gmail.com>.
Hi Ram,

  How to drive the data to the specific hourly region? Use the code like
http://hbase.apache.org/book/perf.writing.html?


Thanks,
Jing Wang

2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>

> Hi JingWang
>
> It is not necessary that region split can cause GC problems.  Based on your
> use case we may need to configure heapspace for the RS.
> Coming back to region splits, presplit of the tables created is a good
> option.
> Assume a case where I know that the data that is going to come into hbase
> is
> on a hourly basis.  Then one option could be presplit your table based on
> the hours and assign the regions in roundrobin fashion to every RS.
> This will ensure that any particular hours data will go into one region
> specified for that hour only.  So after that hour is over the data will be
> moving over to another region server.
> But here again every hour can be split equally into the different RS like 5
> or 10 regions with in an hour.
> These are some ways, but should be chosen as per the data that your cluster
> will be operating upon.
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: jing wang [mailto:happygodwithwang@gmail.com]
> > Sent: Wednesday, September 05, 2012 6:42 PM
> > To: user@hbase.apache.org
> > Subject: Re: reduce influence of auto-splitting region
> >
> > Hi Ram,
> >
> > Thanks for your advice. We did consider what you said.
> > As Hbase is used as a realtime storage,just like mysql/oracle. When
> > splitted, hbase may lead gc to 'stop the world' or some long time full
> > gc.
> > Our application can't accpet this.
> >
> > Thanks,
> > Jing Wang
> >
> > 2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>
> >
> > > You can use the property hbase.hregion.max.filesize.  You can set
> > this to a
> > > higher value and control the splits through your application.
> > >
> > > Regards
> > > Ram
> > >
> > > > -----Original Message-----
> > > > From: jing wang [mailto:happygodwithwang@gmail.com]
> > > > Sent: Wednesday, September 05, 2012 3:48 PM
> > > > To: user@hbase.apache.org
> > > > Subject: reduce influence of auto-splitting region
> > > >
> > > > Hi there,
> > > >
> > > >   Using Hbase as a realtime storage(7*24h), how to reduce the
> > influence
> > > > of
> > > > region auto-splitting?
> > > >   Any advice will be appreciated!
> > > >
> > > >
> > > > Thanks,
> > > > Jing
> > >
> > >
>
>

Re: reduce influence of auto-splitting region

Posted by jing wang <ha...@gmail.com>.
Hi Ram,

Thanks for your advice. We did consider what you said.
As Hbase is used as a realtime storage,just like mysql/oracle. When
splitted, hbase may lead gc to 'stop the world' or some long time full gc.
Our application can't accpet this.

Thanks,
Jing Wang

2012/9/5 Ramkrishna.S.Vasudevan <ra...@huawei.com>

> You can use the property hbase.hregion.max.filesize.  You can set this to a
> higher value and control the splits through your application.
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: jing wang [mailto:happygodwithwang@gmail.com]
> > Sent: Wednesday, September 05, 2012 3:48 PM
> > To: user@hbase.apache.org
> > Subject: reduce influence of auto-splitting region
> >
> > Hi there,
> >
> >   Using Hbase as a realtime storage(7*24h), how to reduce the influence
> > of
> > region auto-splitting?
> >   Any advice will be appreciated!
> >
> >
> > Thanks,
> > Jing
>
>