You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2011/08/15 00:33:48 UTC

When are versions collected?

Hi All,
The following is a session in hbase shell.

You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.

Does "major_compact" from the shell not trigger a major compaction?

Or does it just not do anything because there's only one small file to begin with?


(HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).


I apologize in advance if this is documented somewhere already.


Thanks.

-- Lars


--------------


hbase(main):003:0> describe 'test'
DESCRIPTION                                          ENABLED                    
 {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true                       
 ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS                            
 ION => 'NONE', VERSIONS => '1', TTL => '2147483647'                            
 , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK                            
 CACHE => 'true'}]}                                                             
1 row(s) in 0.0270 seconds

hbase(main):004:0> scan 'test'
ROW                   COLUMN+CELL                                               
0 row(s) in 0.1260 seconds

hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
0 row(s) in 0.0450 seconds

hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2' 
0 row(s) in 0.0130 seconds

hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
0 row(s) in 0.0160 seconds

hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
0 row(s) in 0.0120 seconds

hbase(main):009:0> scan 'test'
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360551865, value=val4          
1 row(s) in 0.0380 seconds

hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360550493, value=val3          
1 row(s) in 0.0310 seconds

hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360549220, value=val2          
1 row(s) in 0.0310 seconds

hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0190 seconds

hbase(main):013:0> major_compact 'test'
0 row(s) in 0.0670 seconds

hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0230 seconds

hbase(main):015:0> major_compact 'test'                          
0 row(s) in 0.0450 seconds

hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0300 seconds

Re: When are versions collected?

Posted by Stack <st...@duboce.net>.
On Wed, Aug 17, 2011 at 3:54 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?
>

It does not.

On the other hand, we have yet to implement a bigtable optimization
where a flush is merged with a storefile; i.e. a new file is written
where the source is the in-memory flushed edits and a storefile that
was already in the filesystem (the original storefile is dropped on
flush completion)

St.Ack

Re: When are versions collected?

Posted by lars hofhansl <lh...@yahoo.com>.
Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?


-- Lars



________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Monday, August 15, 2011 3:59 PM
Subject: Re: When are versions collected?

A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.

J-D

On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION                                          ENABLED
>  {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
>  ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
>  ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
>  CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW                   COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds

Re: When are versions collected?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.

J-D

On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION                                          ENABLED
>  {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
>  ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
>  ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
>  CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW                   COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds