You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2008/04/27 20:46:56 UTC

[jira] Commented: (HBASE-47) option to set TTL for columns in hbase

    [ https://issues.apache.org/jira/browse/HBASE-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592670#action_12592670 ] 

Andrew Purtell commented on HBASE-47:
-------------------------------------

I've started implementing the first approach. We'd like to use this feature. I have the following completed already:

- Add get and set TTL to HColumnDescriptor
- Change shell's CREATE TABLE statement so it takes a TTL parameter for column families (and also ALTER TABLE, etc.), mostly just so all of the current code will compile -- I have heard that HQL is going away...

and am working on the remaining:

- Update HStore methods (get, put, etc) to check HStoreKey's timestamps against TTL value when doing anything
- Compactor should screen out cells past TTL

I have to juggle this with other things but should have a patch and a unit test in a couple of days. 


> option to set TTL for columns in hbase
> --------------------------------------
>
>                 Key: HBASE-47
>                 URL: https://issues.apache.org/jira/browse/HBASE-47
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: hql, regionserver
>            Reporter: Billy Pearson
>            Priority: Minor
>
> I would like to see the option to have a TTL on the columns in hbase this feature could be helpfully in removing stale data from large datasets with out havening to do a full scan of the dataset and then issuing deletes.
> Example 
> Say I am crawling pages and only refreshing pages based on a set score and some pages doe not get updated over X days the old version of the page gets removed from the data set. 
> Say I am striping out links form html and storing them say a link is removed from a page then I would need to issue a delete statement to remove that links form the data set with a ttl the link data would remove its self if not updated in x secs. These are just examples based on crawling like nutch but I can foresee many apps using this option. 
> This is a feature in bigtables thats is handled when bigtable does garbage-collection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.