You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 茅旭峰 <m9...@gmail.com> on 2011/03/17 07:57:48 UTC

How to control the size of WAL logs

Hi,

In our tests, we've accumulated lots of WAL logs, in .logs, which leads to
quite long time pause or even
OOME when restarting either master or region server. We're doing sort of
bulk import and have not using
bulk import tricks, like turning off WAL feature. We think it's unknown how
our application really use the
hbase, so it is possible that users doing batch import unless we're running
out of space. I wonder if there
is any property to set to control the size of WAL, would setting smaller
'hbase.regionserver.logroll.period'
help?

On the other hand, since we have lots of regions, the master is easy to run
into OOME, due to the occupied
memory by the instance of Assignment.regions. When we were trying to restart
the master, it always died
with OOME. I think, from the hprof file,  it is because the instance of
HLogSplitter$OutputSink holds too many
HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer of
wal.SequenceFileLogWriter.
Is there any trick to avoid such kind of scenario?

Thanks and regards,

Mao Xu-Feng

Re: How to control the size of WAL logs

Posted by 茅旭峰 <m9...@gmail.com>.

Thank Cryans, I'll try them!

On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> You can limit the number of WALs and their size on the region server by
> tuning:
>
> hbase.regionserver.maxlogs the default is 32
> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
> blocksize times 0.95
>
> You can limit the number of parallel threads in the master by tuning:
>
> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
> hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28
>
> J-D
>
> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
> > Hi,
> >
> > In our tests, we've accumulated lots of WAL logs, in .logs, which leads
> to
> > quite long time pause or even
> > OOME when restarting either master or region server. We're doing sort of
> > bulk import and have not using
> > bulk import tricks, like turning off WAL feature. We think it's unknown
> how
> > our application really use the
> > hbase, so it is possible that users doing batch import unless we're
> running
> > out of space. I wonder if there
> > is any property to set to control the size of WAL, would setting smaller
> > 'hbase.regionserver.logroll.period'
> > help?
> >
> > On the other hand, since we have lots of regions, the master is easy to
> run
> > into OOME, due to the occupied
> > memory by the instance of Assignment.regions. When we were trying to
> restart
> > the master, it always died
> > with OOME. I think, from the hprof file,  it is because the instance of
> > HLogSplitter$OutputSink holds too many
> > HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer of
> > wal.SequenceFileLogWriter.
> > Is there any trick to avoid such kind of scenario?
> >
> > Thanks and regards,
> >
> > Mao Xu-Feng
> >
>

Re: How to control the size of WAL logs

Posted by 茅旭峰 <m9...@gmail.com>.

Thanks J-D, currently our MAX_FILESIZE is 1GB.

How about the thousands of threads in the master while startup?

===
On the other hand, if there are too many files under /hbase/.logs, when I
was trying to restart the master, there are
over thousands of threads of class DataStreamer and ResponseProcessor, which
are trying to handle the hlogs.
Then quickly, the master turns to OOME, any way to control this situation?
===

Can I control the number of DataStreamer and ResponseProcessor?


On Tue, Mar 22, 2011 at 12:23 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> There's not really anything in hbase preventing you from having that
> many regions, but usually for various reasons we try to keep it under
> a few hundreds. Especially in the bulk uploading case, it has a huge
> impact because of all the memstores a RS has to manage.
>
> You can set the size for splitting by setting MAX_FILESIZE on your
> table to at least 1GB (if you can give your region server a big heap
> like 8-10GB, then you can set those regions even bigger).
>
> J-D
>
> On Mon, Mar 21, 2011 at 7:59 PM, 茅旭峰 <m9...@gmail.com> wrote:
> > Thanks, J-D.
> >
> > No, we are not using any compressor.
> >
> > We have limited node for regionservers, so each of them holds thousands
> of
> > regions, any guideline on this point?
> >
> > On Tue, Mar 22, 2011 at 10:30 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> HBase doesn't put a hard block on the number of hlogs like it does for
> >> memstore size or store files to compact, so it seems you are able to
> >> insert more data than you are flushing.
> >>
> >> Are you using GZ compression? This could be a cause for slow flushes.
> >>
> >> How many regions do you have per region server? Your log seems to
> >> indicate that you have a ton of them.
> >>
> >> J-D
> >>
> >> On Mon, Mar 21, 2011 at 7:23 PM, 茅旭峰 <m9...@gmail.com> wrote:
> >> > Regarding hbase.regionserver.maxlogs,
> >> >
> >> > I've set it to 2, but it turns out the number of files under
> /hbase/.logs
> >> > stills keep increasing.
> >> > I see lots of logs like
> >> > ====
> >> > 2011-03-22 00:00:07,156 DEBUG
> >> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> >> > requested for
> >> >
> >>
> table1,sZD5CTBLUdV55xWWkmkI5rb1mJM=,1300587568567.8a84acf58dd3d684ccaa47d4fb4fd53a.
> >> > because regionserver60020.cacheFlusher; priority=-8, compaction queue
> >> > size=1755
> >> > 2011-03-22 00:00:07,183 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread
> woke
> >> up
> >> > with memory above low water.
> >> > 2011-03-22 00:00:07,186 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Under global
> heap
> >> > pressure: Region
> >> >
> >>
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> >> > has too many store files, but is 6.2m vs best flushable region's 2.1m.
> >> > Choosing the bigger.
> >> > 2011-03-22 00:00:07,186 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> >> >
> >>
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> >> > due to global heap pressure
> >> > 2011-03-22 00:00:07,186 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> > Started memstore flush for
> >> >
> >>
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.,
> >> > current region memstore size 6.2m
> >> > 2011-03-22 00:00:07,201 INFO
> >> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
> >> syncFs
> >> > -- HDFS-200
> >> > 2011-03-22 00:00:07,241 INFO
> >> org.apache.hadoop.hbase.regionserver.wal.HLog:
> >> > Roll
> >> >
> /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723196796,
> >> > entries=119, filesize=67903254. New hlog
> >> >
> /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723207156
> >> > 2011-03-22 00:00:07,241 INFO
> >> org.apache.hadoop.hbase.regionserver.wal.HLog:
> >> > Too many hlogs: logs=398, maxlogs=2; forcing flush of 1 regions(s):
> >> > 334c81997502eb3c66c2bb9b47a87bcc
> >> > 2011-03-22 00:00:07,242 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> > Finished snapshotting, commencing flushing stores
> >> > 2011-03-22 00:00:07,577 INFO
> org.apache.hadoop.hbase.regionserver.Store:
> >> > Renaming flushed file at
> >> >
> >>
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/.tmp/907665384208923152
> >> > to
> >> >
> >>
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315
> >> > 2011-03-22 00:00:07,589 INFO
> org.apache.hadoop.hbase.regionserver.Store:
> >> > Added
> >> >
> >>
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315,
> >> > entries=6, sequenceid=2229486, memsize=6.2m, filesize=6.2m
> >> > 2011-03-22 00:00:07,591 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> > Finished memstore flush of ~6.2m for region
> >> >
> >>
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> >> > in 405ms, sequenceid=2229486, compaction requested=true
> >> > ====
> >> >
> >> > Does this mean we have too many request for the regionsever to catch
> up
> >> with
> >> > the hlogs' increasement?
> >> >
> >> > On the other hand, if there are too many files under /hbase/.logs,
> when I
> >> > was trying to restart the master, there are
> >> > over thousands of threads of class DataStreamer and ResponseProcessor,
> >> which
> >> > are trying to handle the hlogs.
> >> > Then quickly, the master turns to OOME, any way to control this
> >> situation?
> >> >
> >> > On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> You can limit the number of WALs and their size on the region server
> by
> >> >> tuning:
> >> >>
> >> >> hbase.regionserver.maxlogs the default is 32
> >> >> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
> >> >> blocksize times 0.95
> >> >>
> >> >> You can limit the number of parallel threads in the master by tuning:
> >> >>
> >> >> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
> >> >> hbase.regionserver.hlog.splitlog.buffersize the default is
> 1024*1024*!28
> >> >>
> >> >> J-D
> >> >>
> >> >> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > In our tests, we've accumulated lots of WAL logs, in .logs, which
> >> leads
> >> >> to
> >> >> > quite long time pause or even
> >> >> > OOME when restarting either master or region server. We're doing
> sort
> >> of
> >> >> > bulk import and have not using
> >> >> > bulk import tricks, like turning off WAL feature. We think it's
> >> unknown
> >> >> how
> >> >> > our application really use the
> >> >> > hbase, so it is possible that users doing batch import unless we're
> >> >> running
> >> >> > out of space. I wonder if there
> >> >> > is any property to set to control the size of WAL, would setting
> >> smaller
> >> >> > 'hbase.regionserver.logroll.period'
> >> >> > help?
> >> >> >
> >> >> > On the other hand, since we have lots of regions, the master is
> easy
> >> to
> >> >> run
> >> >> > into OOME, due to the occupied
> >> >> > memory by the instance of Assignment.regions. When we were trying
> to
> >> >> restart
> >> >> > the master, it always died
> >> >> > with OOME. I think, from the hprof file,  it is because the
> instance
> >> of
> >> >> > HLogSplitter$OutputSink holds too many
> >> >> > HLogSplitter$WriterAndPaths in logWriters, which even hold the
> buffer
> >> of
> >> >> > wal.SequenceFileLogWriter.
> >> >> > Is there any trick to avoid such kind of scenario?
> >> >> >
> >> >> > Thanks and regards,
> >> >> >
> >> >> > Mao Xu-Feng
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: How to control the size of WAL logs

Posted by Jean-Daniel Cryans <jd...@apache.org>.

There's not really anything in hbase preventing you from having that
many regions, but usually for various reasons we try to keep it under
a few hundreds. Especially in the bulk uploading case, it has a huge
impact because of all the memstores a RS has to manage.

You can set the size for splitting by setting MAX_FILESIZE on your
table to at least 1GB (if you can give your region server a big heap
like 8-10GB, then you can set those regions even bigger).

J-D

On Mon, Mar 21, 2011 at 7:59 PM, 茅旭峰 <m9...@gmail.com> wrote:
> Thanks, J-D.
>
> No, we are not using any compressor.
>
> We have limited node for regionservers, so each of them holds thousands of
> regions, any guideline on this point?
>
> On Tue, Mar 22, 2011 at 10:30 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> HBase doesn't put a hard block on the number of hlogs like it does for
>> memstore size or store files to compact, so it seems you are able to
>> insert more data than you are flushing.
>>
>> Are you using GZ compression? This could be a cause for slow flushes.
>>
>> How many regions do you have per region server? Your log seems to
>> indicate that you have a ton of them.
>>
>> J-D
>>
>> On Mon, Mar 21, 2011 at 7:23 PM, 茅旭峰 <m9...@gmail.com> wrote:
>> > Regarding hbase.regionserver.maxlogs,
>> >
>> > I've set it to 2, but it turns out the number of files under /hbase/.logs
>> > stills keep increasing.
>> > I see lots of logs like
>> > ====
>> > 2011-03-22 00:00:07,156 DEBUG
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
>> > requested for
>> >
>> table1,sZD5CTBLUdV55xWWkmkI5rb1mJM=,1300587568567.8a84acf58dd3d684ccaa47d4fb4fd53a.
>> > because regionserver60020.cacheFlusher; priority=-8, compaction queue
>> > size=1755
>> > 2011-03-22 00:00:07,183 INFO
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke
>> up
>> > with memory above low water.
>> > 2011-03-22 00:00:07,186 INFO
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Under global heap
>> > pressure: Region
>> >
>> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
>> > has too many store files, but is 6.2m vs best flushable region's 2.1m.
>> > Choosing the bigger.
>> > 2011-03-22 00:00:07,186 INFO
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
>> >
>> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
>> > due to global heap pressure
>> > 2011-03-22 00:00:07,186 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Started memstore flush for
>> >
>> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.,
>> > current region memstore size 6.2m
>> > 2011-03-22 00:00:07,201 INFO
>> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
>> syncFs
>> > -- HDFS-200
>> > 2011-03-22 00:00:07,241 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLog:
>> > Roll
>> > /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723196796,
>> > entries=119, filesize=67903254. New hlog
>> > /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723207156
>> > 2011-03-22 00:00:07,241 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLog:
>> > Too many hlogs: logs=398, maxlogs=2; forcing flush of 1 regions(s):
>> > 334c81997502eb3c66c2bb9b47a87bcc
>> > 2011-03-22 00:00:07,242 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Finished snapshotting, commencing flushing stores
>> > 2011-03-22 00:00:07,577 INFO org.apache.hadoop.hbase.regionserver.Store:
>> > Renaming flushed file at
>> >
>> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/.tmp/907665384208923152
>> > to
>> >
>> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315
>> > 2011-03-22 00:00:07,589 INFO org.apache.hadoop.hbase.regionserver.Store:
>> > Added
>> >
>> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315,
>> > entries=6, sequenceid=2229486, memsize=6.2m, filesize=6.2m
>> > 2011-03-22 00:00:07,591 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Finished memstore flush of ~6.2m for region
>> >
>> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
>> > in 405ms, sequenceid=2229486, compaction requested=true
>> > ====
>> >
>> > Does this mean we have too many request for the regionsever to catch up
>> with
>> > the hlogs' increasement?
>> >
>> > On the other hand, if there are too many files under /hbase/.logs, when I
>> > was trying to restart the master, there are
>> > over thousands of threads of class DataStreamer and ResponseProcessor,
>> which
>> > are trying to handle the hlogs.
>> > Then quickly, the master turns to OOME, any way to control this
>> situation?
>> >
>> > On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> You can limit the number of WALs and their size on the region server by
>> >> tuning:
>> >>
>> >> hbase.regionserver.maxlogs the default is 32
>> >> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
>> >> blocksize times 0.95
>> >>
>> >> You can limit the number of parallel threads in the master by tuning:
>> >>
>> >> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
>> >> hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28
>> >>
>> >> J-D
>> >>
>> >> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > In our tests, we've accumulated lots of WAL logs, in .logs, which
>> leads
>> >> to
>> >> > quite long time pause or even
>> >> > OOME when restarting either master or region server. We're doing sort
>> of
>> >> > bulk import and have not using
>> >> > bulk import tricks, like turning off WAL feature. We think it's
>> unknown
>> >> how
>> >> > our application really use the
>> >> > hbase, so it is possible that users doing batch import unless we're
>> >> running
>> >> > out of space. I wonder if there
>> >> > is any property to set to control the size of WAL, would setting
>> smaller
>> >> > 'hbase.regionserver.logroll.period'
>> >> > help?
>> >> >
>> >> > On the other hand, since we have lots of regions, the master is easy
>> to
>> >> run
>> >> > into OOME, due to the occupied
>> >> > memory by the instance of Assignment.regions. When we were trying to
>> >> restart
>> >> > the master, it always died
>> >> > with OOME. I think, from the hprof file,  it is because the instance
>> of
>> >> > HLogSplitter$OutputSink holds too many
>> >> > HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer
>> of
>> >> > wal.SequenceFileLogWriter.
>> >> > Is there any trick to avoid such kind of scenario?
>> >> >
>> >> > Thanks and regards,
>> >> >
>> >> > Mao Xu-Feng
>> >> >
>> >>
>> >
>>
>

Re: How to control the size of WAL logs

Posted by 茅旭峰 <m9...@gmail.com>.

Thanks, J-D.

No, we are not using any compressor.

We have limited node for regionservers, so each of them holds thousands of
regions, any guideline on this point?

On Tue, Mar 22, 2011 at 10:30 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> HBase doesn't put a hard block on the number of hlogs like it does for
> memstore size or store files to compact, so it seems you are able to
> insert more data than you are flushing.
>
> Are you using GZ compression? This could be a cause for slow flushes.
>
> How many regions do you have per region server? Your log seems to
> indicate that you have a ton of them.
>
> J-D
>
> On Mon, Mar 21, 2011 at 7:23 PM, 茅旭峰 <m9...@gmail.com> wrote:
> > Regarding hbase.regionserver.maxlogs,
> >
> > I've set it to 2, but it turns out the number of files under /hbase/.logs
> > stills keep increasing.
> > I see lots of logs like
> > ====
> > 2011-03-22 00:00:07,156 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for
> >
> table1,sZD5CTBLUdV55xWWkmkI5rb1mJM=,1300587568567.8a84acf58dd3d684ccaa47d4fb4fd53a.
> > because regionserver60020.cacheFlusher; priority=-8, compaction queue
> > size=1755
> > 2011-03-22 00:00:07,183 INFO
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke
> up
> > with memory above low water.
> > 2011-03-22 00:00:07,186 INFO
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Under global heap
> > pressure: Region
> >
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> > has too many store files, but is 6.2m vs best flushable region's 2.1m.
> > Choosing the bigger.
> > 2011-03-22 00:00:07,186 INFO
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> >
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> > due to global heap pressure
> > 2011-03-22 00:00:07,186 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Started memstore flush for
> >
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.,
> > current region memstore size 6.2m
> > 2011-03-22 00:00:07,201 INFO
> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
> syncFs
> > -- HDFS-200
> > 2011-03-22 00:00:07,241 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Roll
> > /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723196796,
> > entries=119, filesize=67903254. New hlog
> > /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723207156
> > 2011-03-22 00:00:07,241 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Too many hlogs: logs=398, maxlogs=2; forcing flush of 1 regions(s):
> > 334c81997502eb3c66c2bb9b47a87bcc
> > 2011-03-22 00:00:07,242 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished snapshotting, commencing flushing stores
> > 2011-03-22 00:00:07,577 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Renaming flushed file at
> >
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/.tmp/907665384208923152
> > to
> >
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315
> > 2011-03-22 00:00:07,589 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Added
> >
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315,
> > entries=6, sequenceid=2229486, memsize=6.2m, filesize=6.2m
> > 2011-03-22 00:00:07,591 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished memstore flush of ~6.2m for region
> >
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> > in 405ms, sequenceid=2229486, compaction requested=true
> > ====
> >
> > Does this mean we have too many request for the regionsever to catch up
> with
> > the hlogs' increasement?
> >
> > On the other hand, if there are too many files under /hbase/.logs, when I
> > was trying to restart the master, there are
> > over thousands of threads of class DataStreamer and ResponseProcessor,
> which
> > are trying to handle the hlogs.
> > Then quickly, the master turns to OOME, any way to control this
> situation?
> >
> > On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> You can limit the number of WALs and their size on the region server by
> >> tuning:
> >>
> >> hbase.regionserver.maxlogs the default is 32
> >> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
> >> blocksize times 0.95
> >>
> >> You can limit the number of parallel threads in the master by tuning:
> >>
> >> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
> >> hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28
> >>
> >> J-D
> >>
> >> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > In our tests, we've accumulated lots of WAL logs, in .logs, which
> leads
> >> to
> >> > quite long time pause or even
> >> > OOME when restarting either master or region server. We're doing sort
> of
> >> > bulk import and have not using
> >> > bulk import tricks, like turning off WAL feature. We think it's
> unknown
> >> how
> >> > our application really use the
> >> > hbase, so it is possible that users doing batch import unless we're
> >> running
> >> > out of space. I wonder if there
> >> > is any property to set to control the size of WAL, would setting
> smaller
> >> > 'hbase.regionserver.logroll.period'
> >> > help?
> >> >
> >> > On the other hand, since we have lots of regions, the master is easy
> to
> >> run
> >> > into OOME, due to the occupied
> >> > memory by the instance of Assignment.regions. When we were trying to
> >> restart
> >> > the master, it always died
> >> > with OOME. I think, from the hprof file,  it is because the instance
> of
> >> > HLogSplitter$OutputSink holds too many
> >> > HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer
> of
> >> > wal.SequenceFileLogWriter.
> >> > Is there any trick to avoid such kind of scenario?
> >> >
> >> > Thanks and regards,
> >> >
> >> > Mao Xu-Feng
> >> >
> >>
> >
>

Re: How to control the size of WAL logs

Posted by Jean-Daniel Cryans <jd...@apache.org>.

HBase doesn't put a hard block on the number of hlogs like it does for
memstore size or store files to compact, so it seems you are able to
insert more data than you are flushing.

Are you using GZ compression? This could be a cause for slow flushes.

How many regions do you have per region server? Your log seems to
indicate that you have a ton of them.

J-D

On Mon, Mar 21, 2011 at 7:23 PM, 茅旭峰 <m9...@gmail.com> wrote:
> Regarding hbase.regionserver.maxlogs,
>
> I've set it to 2, but it turns out the number of files under /hbase/.logs
> stills keep increasing.
> I see lots of logs like
> ====
> 2011-03-22 00:00:07,156 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> requested for
> table1,sZD5CTBLUdV55xWWkmkI5rb1mJM=,1300587568567.8a84acf58dd3d684ccaa47d4fb4fd53a.
> because regionserver60020.cacheFlusher; priority=-8, compaction queue
> size=1755
> 2011-03-22 00:00:07,183 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> with memory above low water.
> 2011-03-22 00:00:07,186 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Under global heap
> pressure: Region
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> has too many store files, but is 6.2m vs best flushable region's 2.1m.
> Choosing the bigger.
> 2011-03-22 00:00:07,186 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> due to global heap pressure
> 2011-03-22 00:00:07,186 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.,
> current region memstore size 6.2m
> 2011-03-22 00:00:07,201 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
> -- HDFS-200
> 2011-03-22 00:00:07,241 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll
> /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723196796,
> entries=119, filesize=67903254. New hlog
> /hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723207156
> 2011-03-22 00:00:07,241 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Too many hlogs: logs=398, maxlogs=2; forcing flush of 1 regions(s):
> 334c81997502eb3c66c2bb9b47a87bcc
> 2011-03-22 00:00:07,242 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting, commencing flushing stores
> 2011-03-22 00:00:07,577 INFO org.apache.hadoop.hbase.regionserver.Store:
> Renaming flushed file at
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/.tmp/907665384208923152
> to
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315
> 2011-03-22 00:00:07,589 INFO org.apache.hadoop.hbase.regionserver.Store:
> Added
> hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315,
> entries=6, sequenceid=2229486, memsize=6.2m, filesize=6.2m
> 2011-03-22 00:00:07,591 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~6.2m for region
> table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
> in 405ms, sequenceid=2229486, compaction requested=true
> ====
>
> Does this mean we have too many request for the regionsever to catch up with
> the hlogs' increasement?
>
> On the other hand, if there are too many files under /hbase/.logs, when I
> was trying to restart the master, there are
> over thousands of threads of class DataStreamer and ResponseProcessor, which
> are trying to handle the hlogs.
> Then quickly, the master turns to OOME, any way to control this situation?
>
> On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> You can limit the number of WALs and their size on the region server by
>> tuning:
>>
>> hbase.regionserver.maxlogs the default is 32
>> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
>> blocksize times 0.95
>>
>> You can limit the number of parallel threads in the master by tuning:
>>
>> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
>> hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28
>>
>> J-D
>>
>> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
>> > Hi,
>> >
>> > In our tests, we've accumulated lots of WAL logs, in .logs, which leads
>> to
>> > quite long time pause or even
>> > OOME when restarting either master or region server. We're doing sort of
>> > bulk import and have not using
>> > bulk import tricks, like turning off WAL feature. We think it's unknown
>> how
>> > our application really use the
>> > hbase, so it is possible that users doing batch import unless we're
>> running
>> > out of space. I wonder if there
>> > is any property to set to control the size of WAL, would setting smaller
>> > 'hbase.regionserver.logroll.period'
>> > help?
>> >
>> > On the other hand, since we have lots of regions, the master is easy to
>> run
>> > into OOME, due to the occupied
>> > memory by the instance of Assignment.regions. When we were trying to
>> restart
>> > the master, it always died
>> > with OOME. I think, from the hprof file,  it is because the instance of
>> > HLogSplitter$OutputSink holds too many
>> > HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer of
>> > wal.SequenceFileLogWriter.
>> > Is there any trick to avoid such kind of scenario?
>> >
>> > Thanks and regards,
>> >
>> > Mao Xu-Feng
>> >
>>
>

Re: How to control the size of WAL logs

Posted by 茅旭峰 <m9...@gmail.com>.

Regarding hbase.regionserver.maxlogs,

I've set it to 2, but it turns out the number of files under /hbase/.logs
stills keep increasing.
I see lots of logs like
====
2011-03-22 00:00:07,156 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for
table1,sZD5CTBLUdV55xWWkmkI5rb1mJM=,1300587568567.8a84acf58dd3d684ccaa47d4fb4fd53a.
because regionserver60020.cacheFlusher; priority=-8, compaction queue
size=1755
2011-03-22 00:00:07,183 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
with memory above low water.
2011-03-22 00:00:07,186 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Under global heap
pressure: Region
table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
has too many store files, but is 6.2m vs best flushable region's 2.1m.
Choosing the bigger.
2011-03-22 00:00:07,186 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
due to global heap pressure
2011-03-22 00:00:07,186 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.,
current region memstore size 6.2m
2011-03-22 00:00:07,201 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
-- HDFS-200
2011-03-22 00:00:07,241 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723196796,
entries=119, filesize=67903254. New hlog
/hbase/.logs/cloud138,60020,1300712706331/cloud138%3A60020.1300723207156
2011-03-22 00:00:07,241 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Too many hlogs: logs=398, maxlogs=2; forcing flush of 1 regions(s):
334c81997502eb3c66c2bb9b47a87bcc
2011-03-22 00:00:07,242 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting, commencing flushing stores
2011-03-22 00:00:07,577 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at
hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/.tmp/907665384208923152
to
hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315
2011-03-22 00:00:07,589 INFO org.apache.hadoop.hbase.regionserver.Store:
Added
hdfs://cloud137:9000/hbase/table1/56e3a141164b546ae84d57e46a513922/cfEStore/2298819588481793315,
entries=6, sequenceid=2229486, memsize=6.2m, filesize=6.2m
2011-03-22 00:00:07,591 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~6.2m for region
table1,KGBJhl9RON29fT0hhak5-tBc-zs=,1300521656641.56e3a141164b546ae84d57e46a513922.
in 405ms, sequenceid=2229486, compaction requested=true
====

Does this mean we have too many request for the regionsever to catch up with
the hlogs' increasement?

On the other hand, if there are too many files under /hbase/.logs, when I
was trying to restart the master, there are
over thousands of threads of class DataStreamer and ResponseProcessor, which
are trying to handle the hlogs.
Then quickly, the master turns to OOME, any way to control this situation?

On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> You can limit the number of WALs and their size on the region server by
> tuning:
>
> hbase.regionserver.maxlogs the default is 32
> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
> blocksize times 0.95
>
> You can limit the number of parallel threads in the master by tuning:
>
> hbase.regionserver.hlog.splitlog.writer.threads the default is 3
> hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28
>
> J-D
>
> On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
> > Hi,
> >
> > In our tests, we've accumulated lots of WAL logs, in .logs, which leads
> to
> > quite long time pause or even
> > OOME when restarting either master or region server. We're doing sort of
> > bulk import and have not using
> > bulk import tricks, like turning off WAL feature. We think it's unknown
> how
> > our application really use the
> > hbase, so it is possible that users doing batch import unless we're
> running
> > out of space. I wonder if there
> > is any property to set to control the size of WAL, would setting smaller
> > 'hbase.regionserver.logroll.period'
> > help?
> >
> > On the other hand, since we have lots of regions, the master is easy to
> run
> > into OOME, due to the occupied
> > memory by the instance of Assignment.regions. When we were trying to
> restart
> > the master, it always died
> > with OOME. I think, from the hprof file,  it is because the instance of
> > HLogSplitter$OutputSink holds too many
> > HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer of
> > wal.SequenceFileLogWriter.
> > Is there any trick to avoid such kind of scenario?
> >
> > Thanks and regards,
> >
> > Mao Xu-Feng
> >
>

Re: How to control the size of WAL logs

Posted by Jean-Daniel Cryans <jd...@apache.org>.

You can limit the number of WALs and their size on the region server by tuning:

hbase.regionserver.maxlogs the default is 32
hbase.regionserver.hlog.blocksize the default is whatever your HDFS
blocksize times 0.95

You can limit the number of parallel threads in the master by tuning:

hbase.regionserver.hlog.splitlog.writer.threads the default is 3
hbase.regionserver.hlog.splitlog.buffersize the default is 1024*1024*!28

J-D

On Wed, Mar 16, 2011 at 11:57 PM, 茅旭峰 <m9...@gmail.com> wrote:
> Hi,
>
> In our tests, we've accumulated lots of WAL logs, in .logs, which leads to
> quite long time pause or even
> OOME when restarting either master or region server. We're doing sort of
> bulk import and have not using
> bulk import tricks, like turning off WAL feature. We think it's unknown how
> our application really use the
> hbase, so it is possible that users doing batch import unless we're running
> out of space. I wonder if there
> is any property to set to control the size of WAL, would setting smaller
> 'hbase.regionserver.logroll.period'
> help?
>
> On the other hand, since we have lots of regions, the master is easy to run
> into OOME, due to the occupied
> memory by the instance of Assignment.regions. When we were trying to restart
> the master, it always died
> with OOME. I think, from the hprof file,  it is because the instance of
> HLogSplitter$OutputSink holds too many
> HLogSplitter$WriterAndPaths in logWriters, which even hold the buffer of
> wal.SequenceFileLogWriter.
> Is there any trick to avoid such kind of scenario?
>
> Thanks and regards,
>
> Mao Xu-Feng
>