You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2011/08/15 00:33:48 UTC
When are versions collected?
Hi All,
The following is a session in hbase shell.
You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
Does "major_compact" from the shell not trigger a major compaction?
Or does it just not do anything because there's only one small file to begin with?
(HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
I apologize in advance if this is documented somewhere already.
Thanks.
-- Lars
--------------
hbase(main):003:0> describe 'test'
DESCRIPTION ENABLED
{NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
, BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
CACHE => 'true'}]}
1 row(s) in 0.0270 seconds
hbase(main):004:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1260 seconds
hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
0 row(s) in 0.0450 seconds
hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
0 row(s) in 0.0130 seconds
hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
0 row(s) in 0.0160 seconds
hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
0 row(s) in 0.0120 seconds
hbase(main):009:0> scan 'test'
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360551865, value=val4
1 row(s) in 0.0380 seconds
hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360550493, value=val3
1 row(s) in 0.0310 seconds
hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360549220, value=val2
1 row(s) in 0.0310 seconds
hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360545784, value=val1
1 row(s) in 0.0190 seconds
hbase(main):013:0> major_compact 'test'
0 row(s) in 0.0670 seconds
hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360545784, value=val1
1 row(s) in 0.0230 seconds
hbase(main):015:0> major_compact 'test'
0 row(s) in 0.0450 seconds
hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW COLUMN+CELL
r1 column=cf:x, timestamp=1313360545784, value=val1
1 row(s) in 0.0300 seconds
Re: When are versions collected?
Posted by Stack <st...@duboce.net>.
On Wed, Aug 17, 2011 at 3:54 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?
>
It does not.
On the other hand, we have yet to implement a bigtable optimization
where a flush is merged with a storefile; i.e. a new file is written
where the source is the in-memory flushed edits and a storefile that
was already in the filesystem (the original storefile is dropped on
flush completion)
St.Ack
Re: When are versions collected?
Posted by lars hofhansl <lh...@yahoo.com>.
Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?
-- Lars
________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Monday, August 15, 2011 3:59 PM
Subject: Re: When are versions collected?
A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.
J-D
On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION ENABLED
> {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
> ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
> ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
> , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
> CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds
Re: When are versions collected?
Posted by Jean-Daniel Cryans <jd...@apache.org>.
A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.
J-D
On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION ENABLED
> {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
> ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
> ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
> , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
> CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW COLUMN+CELL
> r1 column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds