You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vladimir Rodionov <vl...@gmail.com> on 2015/08/24 19:23:55 UTC

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

1. How many regions per RS?
2. What is your dfs.block.size?
3. What is your hbase.regionserver.maxlogs?

Flush can be requested when:

1. Region size exceeds hbase.hregion.memstore.flush.size
2. Region's memstore is too old (periodic memstore flusher checks the age
of memstore, default is 1hour) Controlled by
    hbase.regionserver.optionalcacheflushinterval (in ms)
3. There too many unflushed changes in a Region. Controlled by
hbase.regionserver.flush.per.changes, default is 30,000,000
4. WAL is rolling prematurely, controlled by   hbase.regionserver.maxlogs
and  dfs.block.size.

You calculate optimal: hbase.regionserver.maxlogs * dfs.block.size * 0.95 >
hbase.regionserver.global.memstore.upperLimit  * HBASE_HEAPSIZE

I recommend you to enable DEBUG logging and analyze MemStoreFlusher,
PeriodicMemstoreFlusher and HRegion flush related log messages to get idea
why flush was requested on a region(s), what was the region size at that
time.

I think, in your case it is either premature WAL rolling or too many
changes in a memstore.

-Vlad


On Wed, May 27, 2015 at 1:53 PM, Gautam Borah <ga...@gmail.com>
wrote:

> Hi all,
>
> The default size of Hbase.hregion.memstore.flush.size is define as 128 MB
> for Hbase.hregion.memstore.flush.size. Could anyone kindly explain what
> would be the impact if we increase this to a higher value 512 MB or 800 MB
> or higher.
>
> We have a very write heavy cluster. Also we run periodic end point co
> processor based jobs that operate on the data written in the last 10-15
> mins, every 10 minute. We are trying to manage the memstore flush
> operations such that the hot data remains in memstore for at least 30-40
> mins or longer, so that the job hits disk every 3rd or 4th time it tries to
> operate on the hot data (it does scan).
>
> We have region server heap size of 20 GB and set the,
>
> hbase.regionserver.global.memstore.lowerLimit = .45
> hbase.regionserver.global.memstore.upperLimit = .55
>
> We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
> only 10% of the heap is utilized by memstore, after that memstore flushes.
>
> At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
> heap utelization to by memstore to 35%.
>
> It would be very helpful for us to understand the implication of higher
> Hbase.hregion.memstore.flush.size  for a long running cluster.
>
> Thanks,
> Gautam

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

Posted by Vladimir Rodionov <vl...@gmail.com>.
Correction:

> 4. WAL is rolling prematurely, controlled by   hbase.regionserver.maxlogs
and  dfs.block.size.

Should read:
4. WAL is rolling, controlled by   hbase.regionserver.maxlogs and
 dfs.block.size.

-Vlad

On Mon, Aug 24, 2015 at 10:36 AM, Ted Yu <yu...@gmail.com> wrote:

> Related please see HBASE-13408 HBase In-Memory Memstore Compaction
>
> FYI
>
> On Mon, Aug 24, 2015 at 10:32 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > The split policy also uses the flush size to estimate how to split
> > tables...
> >
> > It's sometime fine to upgrade thise number a bit. Like, to 256MB. But 512
> > is pretty high.... And 800MB is even more.
> >
> > Big memstores takes more time to get flush and can block the writes if
> they
> > are not fast enough. If yours are fast enough, then you might be able to
> > stay with 512MB. I don't think 800MB is a good idea...
> >
> > JM
> >
> > 2015-08-24 13:23 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
> >
> > > 1. How many regions per RS?
> > > 2. What is your dfs.block.size?
> > > 3. What is your hbase.regionserver.maxlogs?
> > >
> > > Flush can be requested when:
> > >
> > > 1. Region size exceeds hbase.hregion.memstore.flush.size
> > > 2. Region's memstore is too old (periodic memstore flusher checks the
> age
> > > of memstore, default is 1hour) Controlled by
> > >     hbase.regionserver.optionalcacheflushinterval (in ms)
> > > 3. There too many unflushed changes in a Region. Controlled by
> > > hbase.regionserver.flush.per.changes, default is 30,000,000
> > > 4. WAL is rolling prematurely, controlled by
>  hbase.regionserver.maxlogs
> > > and  dfs.block.size.
> > >
> > > You calculate optimal: hbase.regionserver.maxlogs * dfs.block.size *
> > 0.95 >
> > > hbase.regionserver.global.memstore.upperLimit  * HBASE_HEAPSIZE
> > >
> > > I recommend you to enable DEBUG logging and analyze MemStoreFlusher,
> > > PeriodicMemstoreFlusher and HRegion flush related log messages to get
> > idea
> > > why flush was requested on a region(s), what was the region size at
> that
> > > time.
> > >
> > > I think, in your case it is either premature WAL rolling or too many
> > > changes in a memstore.
> > >
> > > -Vlad
> > >
> > >
> > > On Wed, May 27, 2015 at 1:53 PM, Gautam Borah <ga...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The default size of Hbase.hregion.memstore.flush.size is define as
> 128
> > MB
> > > > for Hbase.hregion.memstore.flush.size. Could anyone kindly explain
> what
> > > > would be the impact if we increase this to a higher value 512 MB or
> 800
> > > MB
> > > > or higher.
> > > >
> > > > We have a very write heavy cluster. Also we run periodic end point co
> > > > processor based jobs that operate on the data written in the last
> 10-15
> > > > mins, every 10 minute. We are trying to manage the memstore flush
> > > > operations such that the hot data remains in memstore for at least
> > 30-40
> > > > mins or longer, so that the job hits disk every 3rd or 4th time it
> > tries
> > > to
> > > > operate on the hot data (it does scan).
> > > >
> > > > We have region server heap size of 20 GB and set the,
> > > >
> > > > hbase.regionserver.global.memstore.lowerLimit = .45
> > > > hbase.regionserver.global.memstore.upperLimit = .55
> > > >
> > > > We observed that if we set the
> Hbase.hregion.memstore.flush.size=128MB
> > > > only 10% of the heap is utilized by memstore, after that memstore
> > > flushes.
> > > >
> > > > At Hbase.hregion.memstore.flush.size=512MB, we are able to increase
> the
> > > > heap utelization to by memstore to 35%.
> > > >
> > > > It would be very helpful for us to understand the implication of
> higher
> > > > Hbase.hregion.memstore.flush.size  for a long running cluster.
> > > >
> > > > Thanks,
> > > > Gautam
> > >
> >
>

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

Posted by Ted Yu <yu...@gmail.com>.
Related please see HBASE-13408 HBase In-Memory Memstore Compaction

FYI

On Mon, Aug 24, 2015 at 10:32 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> The split policy also uses the flush size to estimate how to split
> tables...
>
> It's sometime fine to upgrade thise number a bit. Like, to 256MB. But 512
> is pretty high.... And 800MB is even more.
>
> Big memstores takes more time to get flush and can block the writes if they
> are not fast enough. If yours are fast enough, then you might be able to
> stay with 512MB. I don't think 800MB is a good idea...
>
> JM
>
> 2015-08-24 13:23 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>
> > 1. How many regions per RS?
> > 2. What is your dfs.block.size?
> > 3. What is your hbase.regionserver.maxlogs?
> >
> > Flush can be requested when:
> >
> > 1. Region size exceeds hbase.hregion.memstore.flush.size
> > 2. Region's memstore is too old (periodic memstore flusher checks the age
> > of memstore, default is 1hour) Controlled by
> >     hbase.regionserver.optionalcacheflushinterval (in ms)
> > 3. There too many unflushed changes in a Region. Controlled by
> > hbase.regionserver.flush.per.changes, default is 30,000,000
> > 4. WAL is rolling prematurely, controlled by   hbase.regionserver.maxlogs
> > and  dfs.block.size.
> >
> > You calculate optimal: hbase.regionserver.maxlogs * dfs.block.size *
> 0.95 >
> > hbase.regionserver.global.memstore.upperLimit  * HBASE_HEAPSIZE
> >
> > I recommend you to enable DEBUG logging and analyze MemStoreFlusher,
> > PeriodicMemstoreFlusher and HRegion flush related log messages to get
> idea
> > why flush was requested on a region(s), what was the region size at that
> > time.
> >
> > I think, in your case it is either premature WAL rolling or too many
> > changes in a memstore.
> >
> > -Vlad
> >
> >
> > On Wed, May 27, 2015 at 1:53 PM, Gautam Borah <ga...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > The default size of Hbase.hregion.memstore.flush.size is define as 128
> MB
> > > for Hbase.hregion.memstore.flush.size. Could anyone kindly explain what
> > > would be the impact if we increase this to a higher value 512 MB or 800
> > MB
> > > or higher.
> > >
> > > We have a very write heavy cluster. Also we run periodic end point co
> > > processor based jobs that operate on the data written in the last 10-15
> > > mins, every 10 minute. We are trying to manage the memstore flush
> > > operations such that the hot data remains in memstore for at least
> 30-40
> > > mins or longer, so that the job hits disk every 3rd or 4th time it
> tries
> > to
> > > operate on the hot data (it does scan).
> > >
> > > We have region server heap size of 20 GB and set the,
> > >
> > > hbase.regionserver.global.memstore.lowerLimit = .45
> > > hbase.regionserver.global.memstore.upperLimit = .55
> > >
> > > We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
> > > only 10% of the heap is utilized by memstore, after that memstore
> > flushes.
> > >
> > > At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
> > > heap utelization to by memstore to 35%.
> > >
> > > It would be very helpful for us to understand the implication of higher
> > > Hbase.hregion.memstore.flush.size  for a long running cluster.
> > >
> > > Thanks,
> > > Gautam
> >
>

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
The split policy also uses the flush size to estimate how to split tables...

It's sometime fine to upgrade thise number a bit. Like, to 256MB. But 512
is pretty high.... And 800MB is even more.

Big memstores takes more time to get flush and can block the writes if they
are not fast enough. If yours are fast enough, then you might be able to
stay with 512MB. I don't think 800MB is a good idea...

JM

2015-08-24 13:23 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:

> 1. How many regions per RS?
> 2. What is your dfs.block.size?
> 3. What is your hbase.regionserver.maxlogs?
>
> Flush can be requested when:
>
> 1. Region size exceeds hbase.hregion.memstore.flush.size
> 2. Region's memstore is too old (periodic memstore flusher checks the age
> of memstore, default is 1hour) Controlled by
>     hbase.regionserver.optionalcacheflushinterval (in ms)
> 3. There too many unflushed changes in a Region. Controlled by
> hbase.regionserver.flush.per.changes, default is 30,000,000
> 4. WAL is rolling prematurely, controlled by   hbase.regionserver.maxlogs
> and  dfs.block.size.
>
> You calculate optimal: hbase.regionserver.maxlogs * dfs.block.size * 0.95 >
> hbase.regionserver.global.memstore.upperLimit  * HBASE_HEAPSIZE
>
> I recommend you to enable DEBUG logging and analyze MemStoreFlusher,
> PeriodicMemstoreFlusher and HRegion flush related log messages to get idea
> why flush was requested on a region(s), what was the region size at that
> time.
>
> I think, in your case it is either premature WAL rolling or too many
> changes in a memstore.
>
> -Vlad
>
>
> On Wed, May 27, 2015 at 1:53 PM, Gautam Borah <ga...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > The default size of Hbase.hregion.memstore.flush.size is define as 128 MB
> > for Hbase.hregion.memstore.flush.size. Could anyone kindly explain what
> > would be the impact if we increase this to a higher value 512 MB or 800
> MB
> > or higher.
> >
> > We have a very write heavy cluster. Also we run periodic end point co
> > processor based jobs that operate on the data written in the last 10-15
> > mins, every 10 minute. We are trying to manage the memstore flush
> > operations such that the hot data remains in memstore for at least 30-40
> > mins or longer, so that the job hits disk every 3rd or 4th time it tries
> to
> > operate on the hot data (it does scan).
> >
> > We have region server heap size of 20 GB and set the,
> >
> > hbase.regionserver.global.memstore.lowerLimit = .45
> > hbase.regionserver.global.memstore.upperLimit = .55
> >
> > We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
> > only 10% of the heap is utilized by memstore, after that memstore
> flushes.
> >
> > At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
> > heap utelization to by memstore to 35%.
> >
> > It would be very helpful for us to understand the implication of higher
> > Hbase.hregion.memstore.flush.size  for a long running cluster.
> >
> > Thanks,
> > Gautam
>