You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David chen <c7...@163.com> on 2015/03/10 02:55:34 UTC
Why can the capacity of a table with TTL grow continuously?
The TTL property of the table was set to six months. The table had reached the TTL condition on last October, its capacity was about 700G via command "hdfs dfs -du -h /hbase/data/defaulta" at that time. But every few days, its capacity always became bigger. Now the capacity has reached 939G.
The data capacity should be approximately constant for six months under ours application scene, i also know even if the TTL condition reaches, the data expired can not be deleted at once. But i wonder why the capacity has continuously grown over the past five months(from last October to now).
Re:Re: Re: Re: Re: Why can the capacity of a table with TTL grow
continuously?
Posted by David chen <c7...@163.com>.
Okay, i will increase the hbase.hregion.max.filesize and merge adjacent region, then will report the feedback.
Re: Why can the capacity of a table with TTL grow continuously?
Posted by David chen <c7...@163.com>.
Region size increased from 4G to 10G, and merged the adjacent region, i.e. the region number reduced half approximately. Then the table capacity reduced from 969.4G to 945.2G, the regionServer information are as follows:
ServerName Num. StoresNum. StorefilesStorefile Size UncompressedStorefile SizeIndex SizeBloom Size
Before processing
rs1,60020,1426240567402131 254 556333m 155989mb499463k552615k
rs2,60020,1426240567533125 234 543717m 151136mb498532k542311k
rs3,60020,1426240567677131 255 555199m 155714mb504966k630839k
rs4,60020,1426240567632123 235 540522m 154388mb499349k660950k
rs5,60020,1426240567340128 243 537569m 151889mb488991k612977k
rs6,60020,1426240565314126 238 567384m 158269mb521661k576256k
rs7,60020,1426240565519118 242 569310m 155403mb528435k643266k
After processing
rs1,60020,142624056740262 53 446499m 125661mb408080k462457k
rs2,60020,142624056753368 53 516360m 144451mb470769k499128k
rs3,60020,142624056767771 66 534077m 150870mb481473k587248k
rs4,60020,142624056763275 68 587874m 166252mb546647k703632k
rs5,60020,142624056734071 59 527609m 149605mb478360k587054k
rs6,60020,142624056531463 48 501960m 140437mb466324k512596k
rs7,60020,142624056551976 63 662230m 180782mb608837k706609k
At 2015-03-12 10:34:10, "Alex Baranau" <al...@gmail.com> wrote:
>I'd try that. Please come back with results. Also, if possible it will be
>useful (at least for the important jira mentioned by Nick) if you can share
>the stats on the regions (size, store files #) before and after the
>procedure. Note (as Lars said): "careful, this ... can put some load on the
>net/disks".
>
>One more note: only increasing hbase.hregion.max.filesize may not help to
>completely avoid the same situation in future. It's hard to tell, but if
>you have what I'm thinking, I'd say data distribution pattern has greater
>affect. Though it will be mitigated to some extend by upping the region
>size.
>
>Alex Baranau
>--
>http://cdap.io - open source framework to build and run data applications on
>Hadoop & HBase
>
>On Wed, Mar 11, 2015 at 7:00 PM, David chen <c7...@163.com> wrote:
>
>> hbase.store.delete.expired.storefile is true in file
>> hbase-0.98.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compaction/CompactionConfigureation.java,
>> the reference code is 83th line as follows:
>> shouldDeleteExpired =
>> conf.getBoolean("hbase.store.delete.expired.storefile", true);
>>
>>
>> The region number have grown from 16 to 263 over the past seven months,
>> maybe the hbase.hregion.max.filesize value(4G) is a bit small. It looks
>> likely that the solution is to adjust hbase.hregion.max.filesize bigger and
>> merge the adjacent regions.
>> Any other ideas to suggest?
>>
>>
>>
>>
>>
>>
>>
>>
>>
Re: Re: Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Alex Baranau <al...@gmail.com>.
I'd try that. Please come back with results. Also, if possible it will be
useful (at least for the important jira mentioned by Nick) if you can share
the stats on the regions (size, store files #) before and after the
procedure. Note (as Lars said): "careful, this ... can put some load on the
net/disks".
One more note: only increasing hbase.hregion.max.filesize may not help to
completely avoid the same situation in future. It's hard to tell, but if
you have what I'm thinking, I'd say data distribution pattern has greater
affect. Though it will be mitigated to some extend by upping the region
size.
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications on
Hadoop & HBase
On Wed, Mar 11, 2015 at 7:00 PM, David chen <c7...@163.com> wrote:
> hbase.store.delete.expired.storefile is true in file
> hbase-0.98.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compaction/CompactionConfigureation.java,
> the reference code is 83th line as follows:
> shouldDeleteExpired =
> conf.getBoolean("hbase.store.delete.expired.storefile", true);
>
>
> The region number have grown from 16 to 263 over the past seven months,
> maybe the hbase.hregion.max.filesize value(4G) is a bit small. It looks
> likely that the solution is to adjust hbase.hregion.max.filesize bigger and
> merge the adjacent regions.
> Any other ideas to suggest?
>
>
>
>
>
>
>
>
>
Re:Re: Re: Re: Why can the capacity of a table with TTL grow
continuously?
Posted by David chen <c7...@163.com>.
hbase.store.delete.expired.storefile is true in file hbase-0.98.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compaction/CompactionConfigureation.java, the reference code is 83th line as follows:
shouldDeleteExpired = conf.getBoolean("hbase.store.delete.expired.storefile", true);
The region number have grown from 16 to 263 over the past seven months, maybe the hbase.hregion.max.filesize value(4G) is a bit small. It looks likely that the solution is to adjust hbase.hregion.max.filesize bigger and merge the adjacent regions.
Any other ideas to suggest?
Re: Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Ted Yu <yu...@gmail.com>.
w.r.t. hbase.store.delete.expired.storefile, I checked
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
in 0.98 branch and branch-1
Default value is true.
FYI
On Wed, Mar 11, 2015 at 3:12 PM, Alex Baranau <al...@gmail.com>
wrote:
> Expired rows are also deleted on minor compaction. But, depending on the
> distribution of the writes you may have some regions that don't get any
> writes and hence their files will stay in "frozen" state without any
> compaction being triggerred on them, until major compaction is fired for
> that specific region or the whole table. Given that you reclaimed only a
> bit of space - part of that could be due to this..
>
> http://hbase.apache.org/book.html#ttl also
> mentions hbase.store.delete.expired.storefile config property - be sure to
> have it as true to delete the whole store files (unless files are deleted,
> they occupy space in hdfs).
>
> Alex Baranau
>
> http://cdap.io - open source framework to build and run data applications
> on
> Hadoop & HBase
>
> On Tue, Mar 10, 2015 at 9:15 PM, David chen <c7...@163.com> wrote:
>
> > Thanks lars,
> > I ever ran scan to test TTL for several times, the data expired could
> not
> > be seen.
> > In my application scene, the capacity of everyday collecting data should
> > be almost similar. so the new collecting data should not be more than the
> > data expired.
> > Following your way, I forced a major compaction this morning, the space
> > reduced from 946G to 924G.
> > In order to reclaim the expired space, must force the major compaction?
>
Re: Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Nick Dimiduk <nd...@gmail.com>.
On Wed, Mar 11, 2015 at 3:54 PM, Alex Baranau <al...@gmail.com>
wrote:
> Quick question: have you by any chance noticed the region number to grow a
> lot over the time of your measurements? Note that regions are not merged
> automatically back if they shrink (incl. due to TTL) after being split (
> http://hbase.apache.org/book.html#ops.regionmgt)
>
They're not currently, but we'd like to see it :)
https://issues.apache.org/jira/browse/HBASE-13103
On Wed, Mar 11, 2015 at 3:12 PM, Alex Baranau <al...@gmail.com>
> wrote:
>
> > Expired rows are also deleted on minor compaction. But, depending on the
> > distribution of the writes you may have some regions that don't get any
> > writes and hence their files will stay in "frozen" state without any
> > compaction being triggerred on them, until major compaction is fired for
> > that specific region or the whole table. Given that you reclaimed only a
> > bit of space - part of that could be due to this..
> >
> > http://hbase.apache.org/book.html#ttl also
> > mentions hbase.store.delete.expired.storefile config property - be sure
> to
> > have it as true to delete the whole store files (unless files are
> deleted,
> > they occupy space in hdfs).
> >
> > Alex Baranau
> >
> > http://cdap.io - open source framework to build and run data
> applications on
> > Hadoop & HBase
> >
> > On Tue, Mar 10, 2015 at 9:15 PM, David chen <c7...@163.com> wrote:
> >
> >> Thanks lars,
> >> I ever ran scan to test TTL for several times, the data expired could
> >> not be seen.
> >> In my application scene, the capacity of everyday collecting data should
> >> be almost similar. so the new collecting data should not be more than
> the
> >> data expired.
> >> Following your way, I forced a major compaction this morning, the space
> >> reduced from 946G to 924G.
> >> In order to reclaim the expired space, must force the major compaction?
> >
> >
> >
>
Re: Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Alex Baranau <al...@gmail.com>.
Quick question: have you by any chance noticed the region number to grow a
lot over the time of your measurements? Note that regions are not merged
automatically back if they shrink (incl. due to TTL) after being split (
http://hbase.apache.org/book.html#ops.regionmgt)
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications on
Hadoop & HBase
On Wed, Mar 11, 2015 at 3:12 PM, Alex Baranau <al...@gmail.com>
wrote:
> Expired rows are also deleted on minor compaction. But, depending on the
> distribution of the writes you may have some regions that don't get any
> writes and hence their files will stay in "frozen" state without any
> compaction being triggerred on them, until major compaction is fired for
> that specific region or the whole table. Given that you reclaimed only a
> bit of space - part of that could be due to this..
>
> http://hbase.apache.org/book.html#ttl also
> mentions hbase.store.delete.expired.storefile config property - be sure to
> have it as true to delete the whole store files (unless files are deleted,
> they occupy space in hdfs).
>
> Alex Baranau
>
> http://cdap.io - open source framework to build and run data applications on
> Hadoop & HBase
>
> On Tue, Mar 10, 2015 at 9:15 PM, David chen <c7...@163.com> wrote:
>
>> Thanks lars,
>> I ever ran scan to test TTL for several times, the data expired could
>> not be seen.
>> In my application scene, the capacity of everyday collecting data should
>> be almost similar. so the new collecting data should not be more than the
>> data expired.
>> Following your way, I forced a major compaction this morning, the space
>> reduced from 946G to 924G.
>> In order to reclaim the expired space, must force the major compaction?
>
>
>
Re: Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Alex Baranau <al...@gmail.com>.
Expired rows are also deleted on minor compaction. But, depending on the
distribution of the writes you may have some regions that don't get any
writes and hence their files will stay in "frozen" state without any
compaction being triggerred on them, until major compaction is fired for
that specific region or the whole table. Given that you reclaimed only a
bit of space - part of that could be due to this..
http://hbase.apache.org/book.html#ttl also
mentions hbase.store.delete.expired.storefile config property - be sure to
have it as true to delete the whole store files (unless files are deleted,
they occupy space in hdfs).
Alex Baranau
http://cdap.io - open source framework to build and run data applications on
Hadoop & HBase
On Tue, Mar 10, 2015 at 9:15 PM, David chen <c7...@163.com> wrote:
> Thanks lars,
> I ever ran scan to test TTL for several times, the data expired could not
> be seen.
> In my application scene, the capacity of everyday collecting data should
> be almost similar. so the new collecting data should not be more than the
> data expired.
> Following your way, I forced a major compaction this morning, the space
> reduced from 946G to 924G.
> In order to reclaim the expired space, must force the major compaction?
Re:Re: Re: Why can the capacity of a table with TTL grow
continuously?
Posted by David chen <c7...@163.com>.
Thanks lars,
I ever ran scan to test TTL for several times, the data expired could not be seen.
In my application scene, the capacity of everyday collecting data should be almost similar. so the new collecting data should not be more than the data expired.
Following your way, I forced a major compaction this morning, the space reduced from 946G to 924G.
In order to reclaim the expired space, must force the major compaction?
Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by lars hofhansl <la...@apache.org>.
I agree. This looks as it should.You also mentioned that you have compactions enabled.
If you force a major_compact through the HBase shell, will some space be reclaimed? (careful, this will compact everything in the table, which can put some load on the net/disks).Lastly, did you stop collecting new data? Are you sure you're not just collecting more data? (just asking :) )
To verify whether old data is still there or not, you can also run a raw scan. Something like this from the shell:
scan <table>, {RAW => true, VERSIONS => 10000, TIMERANGE => [0, 1408749519438]}
(1408749519438 is about 200 days ago from right now)
If that query returns anything, then we'd have a problem.
-- Lars
From: Jean-Marc Spaggiari <je...@spaggiari.org>
To: user <us...@hbase.apache.org>
Sent: Monday, March 9, 2015 9:08 PM
Subject: Re: Re: Why can the capacity of a table with TTL grow continuously?
Ok, nothing wrong there.
Have you disabled the daily (or weekly) major compactions on your cluster?
Wich HBase version are you running?
JM
2015-03-09 23:53 GMT-04:00 David chen <c7...@163.com>:
> Thanks jean-marc, the table description is as following:
> 'tab_normal', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
> 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS =>
> '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS => '0',
> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS =>
> '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>
> 'false', BLOCKCACHE => 'true'}, {NAME => 'idx', DATA_BLOCK_ENCODING =>
> 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION =>
> 'SNAPPY', VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)',
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Re:Re: Re: Why can the capacity of a table with TTL grow
continuously?
Posted by David chen <c7...@163.com>.
The version is 0.98.6+cdh5.2.0, major compaction is enable, but the "hbase.hregion.majorcompaction" property was set to 10 days instead of the default 7 days before approximately tow months.
Re: Re: Why can the capacity of a table with TTL grow continuously?
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok, nothing wrong there.
Have you disabled the daily (or weekly) major compactions on your cluster?
Wich HBase version are you running?
JM
2015-03-09 23:53 GMT-04:00 David chen <c7...@163.com>:
> Thanks jean-marc, the table description is as following:
> 'tab_normal', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
> 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS =>
> '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS => '0',
> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS =>
> '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>
> 'false', BLOCKCACHE => 'true'}, {NAME => 'idx', DATA_BLOCK_ENCODING =>
> 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION =>
> 'SNAPPY', VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)',
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Re:Re: Why can the capacity of a table with TTL grow continuously?
Posted by David chen <c7...@163.com>.
Thanks jean-marc, the table description is as following:
'tab_normal', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'idx', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '10000', TTL => '16070400 SECONDS (186 DAYS)', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Re: Why can the capacity of a table with TTL grow continuously?
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Can you please paste the table description here?
Thanks,
JM
Le 2015-03-09 21:56, "David chen" <c7...@163.com> a écrit :
> The TTL property of the table was set to six months. The table had reached
> the TTL condition on last October, its capacity was about 700G via command
> "hdfs dfs -du -h /hbase/data/defaulta" at that time. But every few days,
> its capacity always became bigger. Now the capacity has reached 939G.
> The data capacity should be approximately constant for six months under
> ours application scene, i also know even if the TTL condition reaches, the
> data expired can not be deleted at once. But i wonder why the capacity has
> continuously grown over the past five months(from last October to now).
>
>