You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/07/07 05:55:16 UTC

[jira] [Created] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Data GC: Remove all versions > TTL EXCEPT the last written version
------------------------------------------------------------------

                 Key: HBASE-4071
                 URL: https://issues.apache.org/jira/browse/HBASE-4071
             Project: HBase
          Issue Type: New Feature
            Reporter: stack


We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089087#comment-13089087 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1591
-----------------------------------------------------------


Nice work!  Some formatting things (didn't comment on some lines > 80 chars) and need to update some of the javadocs for the new arguments.  Will look through the patch one more time before I ask questions.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3577>

    line > 80 chars



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3578>

    javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3582>

    seems like this change actually changes what the functionality / lines are between ColumnTracker vs. QueryMatcher.  should you update the javadoc here to describe what this is not responsible for?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3579>

    update the javadoc here... maybe explain why we pass ttl?  seems odd that this removed timestamp from the API



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3581>

    update javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3580>

    long line



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/1582/#comment3583>

    spaces between arguments:  (minVersions, maxVersions, ttl)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3584>

    update javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3585>

    i'm a little confused about this.  this doesn't need to look at the number of included versions?  you are only ever done if minVersions is explicitly set to 0 (disabled)?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3587>

    this is a nice comment, but is this true?
    
    can't i safely delete anything that is expired once i have hit my min for this file?  we may not evict _enough_ but we can at least evict something?
    
    unfortunately we don't use the query matcher here, but should be easy to add it here (someone else has added it here as part of some other changes, but they aren't committed yet and won't be for some time)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3588>

    nice explanation


- Jonathan


On 2011-08-21 00:32:00, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-21 00:32:00)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4071:
---------------------------------

    Attachment: MinVersions-v9.diff

Attaching the final diff for future. reference.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions-v9.diff, MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087381#comment-13087381 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-18 23:01:12.689967)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Summary (updated)
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087156#comment-13087156 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Again sorry for he review churn. Updates were posted even after I removed the jira from the review :(

Quick update...
Since there are so many actors involved in this (Store, Region, ScanQueryMatcher, ColumnTrackers), all with slightly different intricate logic, I think abstracting this out into an interface will either not make it nicer, or require me to rewrite the entire logic.
Instead I unified both the TTL and Versioning logic inside the ColumnTrackers, while still giving the trackers a chance to bail out early. That made it simpler, and will hopefully make it easier in the future to abstract this further (I think that needs to be coordinated with the Compaction Coprocessor work).

One problem I encountered is Store.getRowKeyAtOrBefore. That currently honors TTL but not MaxVersions (which is strange). I'm thinking I'll either leave that alone, or have it also not honor TTL when the store has minversions.

Fixing this correctly in all cases would mean to scan all relevant KVs in the Memstore (i.e. ignoring TTL and version restrictions), then use those candidates to scan the storefiles (now honoring TTL and doing the version counting).

Added a new test that validates the basic behavior.
Running tests now. (I seems to have a hard time to get a full test run through locally - with or without my patch).

Will attach a new patch soon that should still be considered a sketch but should hold up to a bit more scrutiny.


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Hbase Build Acct (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090459#comment-13090459 ] 

Hbase Build Acct commented on HBASE-4071:
-----------------------------------------

thank you



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090475#comment-13090475 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Still strange that the test passes locally.





> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089885#comment-13089885 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1602
-----------------------------------------------------------

Ship it!


One thing only in below... patch looks good to commit... just one question below (I can fix the other thing myself on commit -- the javadoc).


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3622>

    good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3623>

    javadoc doesn't match param name -- I can fix on commit.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3624>

    I defer to Jon's review here (he knows this stuff...)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3625>

    Why this change?


- Michael


On 2011-08-23 05:13:38, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-23 05:13:38)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085889#comment-13085889 ] 

stack commented on HBASE-4071:
------------------------------

bq. In both cases I think only at compaction time do we have enough information to remove expired cells (if minversions is >0).

Fair enough.

bq. So you think test coverage of the existing functionality is sufficient? That is very good to know.

IIRC, there are some decent tests for this stuff.  Also, the operation is so fundamental that if broke, it'd bubble up as a broken test somewhere (the connection back down into this code might be an indirection on top of an indirection, but a test will break I'd say).

bq. What's the general feeling? Should I aim for minimal intrusion or attempt to do a bit refactoring to abstract these policies into an interface? Leaning towards the latter, but on the other hand the change would be more risky.

I'm for the latter.  We're trying to push out next major revision of hbase.  It'll get some scrutiny and testing so if you've broken something it should show (or if not, our coverage needs improving).

Good stuff LarsH.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-4071.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
     Release Note: Even if older than TTL, keep N versions; e.g. if N is 1, we'll purge all versions but the most recent written even if this one version is older than TTL.
     Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thanks for sweet feature Lars (Nice reviewing by the other lads...)

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253526#comment-13253526 ] 

Hadoop QA commented on HBASE-4071:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12491647/MinVersions-v9.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 12 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1513//console

This message is automatically generated.
                
> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions-v9.diff, MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087375#comment-13087375 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Please have a look at the review now (the same one), and let me know what you think.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087388#comment-13087388 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1537
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3503>

    HRegion.getClosestRowBefore() currently calls this method.


- Ted


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061402#comment-13061402 ] 

Todd Lipcon commented on HBASE-4071:
------------------------------------

Sure, could be another type of coprocessor. Certainly would be nice to dynamically load them and share the other CP infrastructure.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-4071:
----------------------------

    Assignee: Lars Hofhansl

Assigning to LarsH

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090465#comment-13090465 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Thanks Ted!

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089268#comment-13089268 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-23 04:55:59.908653)


Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087384#comment-13087384 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1536
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3495>

    Will fix the white space



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3496>

    This means that expired rows are not removed on flush when minVersions is set.
    That is because at this point we do not have enough information.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3498>

    Note the interface change here.
    KeyValue, could have a timestamp, in which case we'd look for that particular version.
    No caller used this, and passing byte[] avoids that problem completely.


- Lars


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089266#comment-13089266 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-23 04:53:14.501780)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
-------

More line length and comment fixes.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085853#comment-13085853 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Thanks Stack.

re if (currentCount <= minVersions): in this case the count was already incremented by the maxversions check (which is just outside of the context provided by the diff).

re expunging expired rows on flush: Currently it seems this is done as an optimization.
The reasons why believe this cannot be done if minversions>0 are: (1) The cache might not have all version, so I cannot count the versions to determine the cut-off point. (2) Even if we have minversions=1, there is no guarantee that the versions in the cache include the latest one (puts could have been backdated).
In both cases I think only at compaction time do we have enough information to remove expired cells (if minversions is >0).

re comments: Of course. That's why there is version control :)
I just left these in as comments to have a place where I can put a comment as to why I believe these are not necessary.

So you think test coverage of the existing functionality is sufficient? That is very good to know.
I'll add tests for the new functionality.

What's the general feeling? Should I aim for minimal intrusion or attempt to do a bit refactoring to abstract these policies into an interface? Leaning towards the latter, but on the other hand the change would be more risky.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085541#comment-13085541 ] 

stack commented on HBASE-4071:
------------------------------

@LarsH

Patch looks great.  Minor intrusion.  I'd say that best way to build confidence that all works as it did is to run all unit tests -- IIRC we should have coverage enough to notice if you've broken the default behavior -- and then for your additions, work up a few unit tests that exercise the new functionality (This will also uncover any other code your patch should have touched -- I don't think you've missed anything since same Scan code is used compacting.... this you might want to verify, that compacting we do right thing, especially major vs minor around this new minimum versions config).

Is this right?

+      if (currentCount <= minVersions)

The other two instances preincrement currentCount.  This instance doesn't.  Is there a reason (If so, probably deserves a comment).  Looks too like a bit of repeated code here that perhaps could be factored out to a method of its own?

This is sketch code but FYI, around these parts folks remove code rather than comment it out...

On 1. above, 'Expired rows can no longer be expunged...', why not?  Are we doing this now?  (I don't remember) If so, why can't we do this still?

On 2., it don't look too bad to me

On 7., yes.

Good on you LarsH.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090449#comment-13090449 ] 

Ted Yu commented on HBASE-4071:
-------------------------------

Here is the addendum I committed:
{code}
Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java       (revision 1161190)
+++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java       (working copy)
@@ -1734,7 +1734,7 @@
   public static final long FIXED_OVERHEAD = ClassSize.align(
       ClassSize.OBJECT + (15 * ClassSize.REFERENCE) +
       (8 * Bytes.SIZEOF_LONG) + (1 * Bytes.SIZEOF_DOUBLE) +
-      (4 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN));
+      (5 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN));
 
   public static final long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD +
       ClassSize.OBJECT + ClassSize.REENTRANT_LOCK +
{code}
Forgetting to update FIXED_OVERHEAD is very common.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061396#comment-13061396 ] 

Andrew Purtell commented on HBASE-4071:
---------------------------------------

bq. Rather than hard-code these various policies, it would be very nice if the compaction dropping policy were pluggable. 

We could do this within the coprocessor framework, as one option. Or would you prefer to have core be pluggable in this respect?

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086804#comment-13086804 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-18 05:01:59.054304)


Review request for Ian Varley.


Summary
-------

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

See TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087400#comment-13087400 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------



bq.  On 2011-08-18 23:15:00, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 1277
bq.  > <https://reviews.apache.org/r/1582/diff/2/?file=33571#file33571line1277>
bq.  >
bq.  >     HRegion.getClosestRowBefore() currently calls this method.

Sorry what I meant to say is that no caller makes use of passing a backdated KeyValue to this method. Changing the signature here makes sure that nobody will. The caller in HRegion is also changed as part of this patch.


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1537
-----------------------------------------------------------


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088133#comment-13088133 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-20 05:05:27.147665)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
-------

o Added more comments.
o Added more tests.
o Added Apache license to test file.
o Remove some unused variable and imports from Store.java (while I'm changing it anyway).
o Removed some more extraneous whitespace.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089940#comment-13089940 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------



bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java, line 127
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line127>
bq.  >
bq.  >     I defer to Jon's review here (he knows this stuff...)

This basically just counts versions up instead of down (see next comment below as to why).


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java, line 205
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line205>
bq.  >
bq.  >     Why this change?

ExplicitColumnTracker used to count versions down and ScanWildcardColumnTracker is counting version up.
This change goes to together with the previous change (increment vs decrement).

I did that for two reasons:

1. When both column trackers follow similar logic it will be easier to change them and to simplify them further in the future (such as extracting a GC policy).

2. Now the comparison for versions can be:
 if(count >= maxVersions || (count >= minVersions && isExpired(timestamp))
    // Done with versions for this column

whereas otherwise it would need to be:
 if(count == 0) || ((maxVersion-count) < minVersions && isExpired(timestamp))
    // Done with versions for this column

Which is harder to read and less intuitive, I think.


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java, line 47
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34606#file34606line47>
bq.  >
bq.  >     javadoc doesn't match param name -- I can fix on commit.

Argh... One more that I missed. Thanks Stack!


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1602
-----------------------------------------------------------


On 2011-08-23 05:13:38, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-23 05:13:38)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088290#comment-13088290 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-21 00:32:00.549363)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
-------

o More tests for Filters.
o ExplicitColumnTracker was doing too much work with minversions > 0.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086808#comment-13086808 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-18 05:07:35.742598)


Review request for Ian Varley.


Summary
-------

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

See TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084792#comment-13084792 ] 

Andrew Purtell commented on HBASE-4071:
---------------------------------------

TTLs were introduced to conveniently expire transient data and garbage collect it. Setting a TTL on a column family defines that column family as transient. Whatever changes are contemplated to be "hard coded", this use case must be maintained. I want to be able to set a TTL and versions to 1 or 3 or 10 or whatever and have ALL of the versions discarded when their timestamps indicate expiry.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084783#comment-13084783 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Hmm except that that would change existing behavior, as # of versions is always set to something.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085518#comment-13085518 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

We could have a CompactionPolicy (or something similarly named) interface implemented in each Store. I need to spend some time looking at all the various cases to see how to abstract this.

In the test-patch above, did I miss any part that would need to be touched by this?


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087379#comment-13087379 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-18 22:58:25.647922)


Review request for hbase and Ian Varley.


Summary (updated)
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing (updated)
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086803#comment-13086803 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Arggh... Please disregard the review request, I forgot it would automatically add it to the bug.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087813#comment-13087813 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Any objections if I assigned this jira to me?

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084786#comment-13084786 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Having had a very brief look at the code it seems that wouldn't even be that hard.
If we had (say) another CF attributes called "minversions", we could check for that in ScanQueryMatcher.match(...) as such:
If minversions > 0 we'd still do the CF expiry check (isExpired), but instead of bailing right there, we'd remember the resulting MatchCode, and continue all the way to checkColumn(...) of the ColumnTracker. checkColumn would receive the MatchCode from the expiry check as additional argument. Deletes and filters should still behave correctly as far as I can see.

In ColumnTracker.checkColumn we now have three cases:
1. # version < minversions: behave as if the version check was positive
2. minversions < # versions < maxversion: return whatever the CF expiry check would have returned. (the details are probably a bit more tricky)
3. # versions > maxversions: same behavior as before.


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089273#comment-13089273 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1596
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
<https://reviews.apache.org/r/1582/#comment3603>

    Javadoc is also needed for this new test.



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
<https://reviews.apache.org/r/1582/#comment3604>

    Did you mean 'getClosestRowBefore() can get'


- Ted


On 2011-08-23 04:55:59, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-23 04:55:59)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089145#comment-13089145 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-22 23:24:57.784160)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
-------

Fixed comments, Javadocs, and linebreaks.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Ian Varley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084778#comment-13084778 ] 

Ian Varley commented on HBASE-4071:
-----------------------------------

I like the idea of making this pluggable (via the coprocessor framework, or otherwise). But I also think this is a fundamental enough policy option that making it hard-coded might be a good idea. When I was talking to someone the other day about the current TTL policy, he was like, "WTF, who would want that, it eats your data?". There's no such thing as a "keep 0 versions" option, and thus no way to accidentally lose your most current data using that approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming from an RDBMS background.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061107#comment-13061107 ] 

Todd Lipcon commented on HBASE-4071:
------------------------------------

Rather than hard-code these various policies, it would be very nice if the compaction dropping policy were pluggable. I've often had people request the ability to do something like check the value of a column, and change TTL based on that value. Or, given that we have an entire row available when doing a major compaction, it would be possible to implement a feature like putting an expiration time as the value of one column, and having it expire the entire row's worth at that timestamp.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084800#comment-13084800 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Of course you're right Andrew. We shouldn't mix usecases together.
What do you think of the "minversions" idea?
As long as minversions for a CF is 0 the behavior would be identical to the current 
behavior.
Also maybe what we have in mind is not a common scenario after all, and we shouldn't
complicate the core codebase with that.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090009#comment-13090009 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Yeah! :)
Thanks Stack, and all the other folks who reviewed the code.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088074#comment-13088074 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1578
-----------------------------------------------------------


this looks pretty good to me, but I don't know this area of the code that well. Hopefully we can get several eyes on it before commit - the FB guys know this code well, as does Ryan, iirc.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
<https://reviews.apache.org/r/1582/#comment3562>

    nit: just delete these since they're obvious



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/1582/#comment3563>

    maybe worth adding a good comment here explaining this logic - I think it makes sense but I can imagine being confused in a few months :)



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
<https://reviews.apache.org/r/1582/#comment3564>

    add apache license


- Todd


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086799#comment-13086799 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

Review request for Ian Varley.


Summary
-------

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

See TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4071:
---------------------------------

    Attachment: MinVersions.diff

"A patch is worth a thousand words."

Here's an *idea* for patch.

Still getting familiar with the HBase code (so please cut me some slack, I may have missed an entire subsystem.)

I did some light testing both with Wildcard and Explicit Column Trackers, and things seem to work for me so far, including the shell.

Not too happy about the patch, though:
1. Expired rows can no longer be expunged when a Store's internal cache is flushed (and minversions > 0). Although I don't see how that can be avoided as not all versions are in the cache.
2. Makes the code harder to follow.
3. I just stubbed out the testing stuff. For now.
4. It's hardcoded (Todd won't like it :) )
5. Passes a MatchCode to ColumnTracker.checkColumn
6. The relation between TTL and Version just became more complicated.
7. Can I assume the Trackers deliver KV's in reverse time order?

Please let me know if this is off track.

Thanks...


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087454#comment-13087454 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1542
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
<https://reviews.apache.org/r/1582/#comment3513>

    Still using HBaseTestCase here, because it gave me all the lowlevel stuff that I needed.
    If folks feel strongly I'll add what I need to HBaseTestingUtility and write jUnit4 tests instead.


- Lars


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084782#comment-13084782 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

I also think a simple notion like "keep everything more recent than T, but at least N versions" seems to be a frequent enough option to be hard-coded.
Maybe that could be indicated simply by setting both # of versions AND TTL on a column family.


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084872#comment-13084872 ] 

Andrew Purtell commented on HBASE-4071:
---------------------------------------

bq. What do you think of the "minversions" idea?

I think a new "minversions" in core is a fine idea. We can document how minversions, maxversions, and TTL settings interact with each other simply enough.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085175#comment-13085175 ] 

Todd Lipcon commented on HBASE-4071:
------------------------------------

It seems OK to hardcode this, but if it's easy to refactor this even to an internal (non-coprocessor non-loadable) interface, that would be cool.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089160#comment-13089160 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1595
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3601>

    these are flipped in order from the signature



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/1582/#comment3602>

    @return javadoc doesn't include the variable name (on the @param does)


- Jonathan


On 2011-08-22 23:24:57, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-22 23:24:57)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089276#comment-13089276 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-23 05:13:38.495607)


Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.


Changes
-------

Added Javadoc to TestMinVersions.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061102#comment-13061102 ] 

Jonathan Gray commented on HBASE-4071:
--------------------------------------

I like this idea.  It's somewhat related to an idea for a TTKAV (TimeToKeepAllValues) parameter that would allow a point-in-time SnapshotScanner.  See HBASE-2376

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090619#comment-13090619 ] 

Hudson commented on HBASE-4071:
-------------------------------

Integrated in HBase-TRUNK #2141 (See [https://builds.apache.org/job/HBase-TRUNK/2141/])
    HBASE-4071 update FIXED_OVERHEAD for minVersions

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0
>
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089103#comment-13089103 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------



bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java, line 205
bq.  > <https://reviews.apache.org/r/1582/diff/4/?file=34041#file34041line205>
bq.  >
bq.  >     i'm a little confused about this.  this doesn't need to look at the number of included versions?  you are only ever done if minVersions is explicitly set to 0 (disabled)?

Yep... This is just to keep the current optimization where if we do not have minVersions to worry about (the default) we can bail early if the KV is expired.


bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java, lines 24-41
bq.  > <https://reviews.apache.org/r/1582/diff/4/?file=34037#file34037line24>
bq.  >
bq.  >     seems like this change actually changes what the functionality / lines are between ColumnTracker vs. QueryMatcher.  should you update the javadoc here to describe what this is not responsible for?

Will fix this and all other Javadocs.


bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 499
bq.  > <https://reviews.apache.org/r/1582/diff/4/?file=34042#file34042line499>
bq.  >
bq.  >     this is a nice comment, but is this true?
bq.  >     
bq.  >     can't i safely delete anything that is expired once i have hit my min for this file?  we may not evict _enough_ but we can at least evict something?
bq.  >     
bq.  >     unfortunately we don't use the query matcher here, but should be easy to add it here (someone else has added it here as part of some other changes, but they aren't committed yet and won't be for some time)

You are right this can be optimized better as we discussed. Stack actually had the same comment on the issue.


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1591
-----------------------------------------------------------


On 2011-08-21 00:32:00, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-21 00:32:00)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087380#comment-13087380 ] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
-----------------------------------------------------------

(Updated 2011-08-18 22:59:37.165069)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Summary
-------

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
    https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing
-------

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written.  So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira