You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yair Even-Zohar <ya...@revenuescience.com> on 2008/10/15 17:24:52 UTC

Deleting old versions from a table

I would like to delete old versions from a table on a daily basis and am
thinking to implement:

 

1) Run a map/reduce (similar to RowCounter) and for each rowid, execute
a deleteall(rowed, timestamp)

2) Similar to (1), but with a scanner. I could also write a filter to
filter retrieve only rowids that have data older then timestamp.

 

Before I start writing code, I would like to know if there is an
existing process to delete old data?

 

Thanks

-Yair

  


Re: Deleting old versions from a table

Posted by stack <st...@duboce.net>.
Thanks for the feedback.  I added clarification to TRUNK (I didn't 
change names of methods; just updated javadoc and added comments around 
its use in HStore); hbase-929.
St.Ack


Yair Even-Zohar wrote:
> Seems like you and Jim Kellerman are both correct (his reply is that it
> is in millisec)
>
> The timeTotLive (ttl) is a long in the HStore and is represented in
> millisecs in HStore but I also found in the code
>
>   if (ttl != HConstants.FOREVER)
>       this.ttl *= 1000;
>
> so I could only assume the parameter is passed to the HColumnDescriptor
> in seconds.
>
> Please notice that " int 	getTimeToLive()" returns an int and not
> long. Also, "setTimeToLive(int timeToLive) ".
>
>
> As far as documentation, I pretty much looked everywhere and couldn't
> find any reference to the granularity. It would have been sufficient if
> the parameter name was changed. That is, instead of:
> setTimeToLive(int timeToLive)
> use
> setTimeToLive(int timeToLiveInSec)
>
> Additionally, adding this data as a comment for the setter /getter at
> HColumnDescriptor would be sufficient as it will be reflected in the api
> docs. 
>
> Thanks
> -Yair
> -----Original Message-----
> From: stack [mailto:stack@duboce.net] 
> Sent: Wednesday, October 15, 2008 11:21 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Looks like ttl is in seconds (See head of the HStore file).
>
> Do you have suggestion as to where we should document this (Where did 
> you try looking?).
>
> Thanks,
> St.Ack
>
> Yair Even-Zohar wrote:
>   
>> I need this feature because I'd like old data to expire after X days. 
>> I now see that I can use HColumnDescriptor.setTimeToLive(int
>> timeToLive). So, my question is what is the granularity of the
>> "timeToLive" parameter  (Days / Hours/ Second) ?
>>
>> Thanks
>> -Yair 
>>
>>
>> -----Original Message-----
>> From: Dingding Ye [mailto:yedingding@gmail.com] 
>> Sent: Wednesday, October 15, 2008 10:43 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Deleting old versions from a table
>>
>> Why do you want to do that?  I think limited the column family with
>> VERSIONS
>> is enough.
>>
>> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
>> <ya...@revenuescience.com>wrote:
>>
>>   
>>     
>>> I would like to delete old versions from a table on a daily basis and
>>>     
>>>       
>> am
>>   
>>     
>>> thinking to implement:
>>>
>>>
>>>
>>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>>     
>>>       
>> execute
>>   
>>     
>>> a deleteall(rowed, timestamp)
>>>
>>> 2) Similar to (1), but with a scanner. I could also write a filter to
>>> filter retrieve only rowids that have data older then timestamp.
>>>
>>>
>>>
>>> Before I start writing code, I would like to know if there is an
>>> existing process to delete old data?
>>>
>>>
>>>
>>> Thanks
>>>
>>> -Yair
>>>
>>>
>>>
>>>
>>>     
>>>       
>
>   


RE: Deleting old versions from a table

Posted by Yair Even-Zohar <ya...@revenuescience.com>.
Seems like you and Jim Kellerman are both correct (his reply is that it
is in millisec)

The timeTotLive (ttl) is a long in the HStore and is represented in
millisecs in HStore but I also found in the code

  if (ttl != HConstants.FOREVER)
      this.ttl *= 1000;

so I could only assume the parameter is passed to the HColumnDescriptor
in seconds.

Please notice that " int 	getTimeToLive()" returns an int and not
long. Also, "setTimeToLive(int timeToLive) ".


As far as documentation, I pretty much looked everywhere and couldn't
find any reference to the granularity. It would have been sufficient if
the parameter name was changed. That is, instead of:
setTimeToLive(int timeToLive)
use
setTimeToLive(int timeToLiveInSec)

Additionally, adding this data as a comment for the setter /getter at
HColumnDescriptor would be sufficient as it will be reflected in the api
docs. 

Thanks
-Yair
-----Original Message-----
From: stack [mailto:stack@duboce.net] 
Sent: Wednesday, October 15, 2008 11:21 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Deleting old versions from a table

Looks like ttl is in seconds (See head of the HStore file).

Do you have suggestion as to where we should document this (Where did 
you try looking?).

Thanks,
St.Ack

Yair Even-Zohar wrote:
> I need this feature because I'd like old data to expire after X days. 
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter  (Days / Hours/ Second) ?
>
> Thanks
> -Yair 
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com] 
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that?  I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
>   
>> I would like to delete old versions from a table on a daily basis and
>>     
> am
>   
>> thinking to implement:
>>
>>
>>
>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>     
> execute
>   
>> a deleteall(rowed, timestamp)
>>
>> 2) Similar to (1), but with a scanner. I could also write a filter to
>> filter retrieve only rowids that have data older then timestamp.
>>
>>
>>
>> Before I start writing code, I would like to know if there is an
>> existing process to delete old data?
>>
>>
>>
>> Thanks
>>
>> -Yair
>>
>>
>>
>>
>>     


Re: Deleting old versions from a table

Posted by stack <st...@duboce.net>.
Looks like ttl is in seconds (See head of the HStore file).

Do you have suggestion as to where we should document this (Where did 
you try looking?).

Thanks,
St.Ack

Yair Even-Zohar wrote:
> I need this feature because I'd like old data to expire after X days. 
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter  (Days / Hours/ Second) ?
>
> Thanks
> -Yair 
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com] 
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that?  I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
>   
>> I would like to delete old versions from a table on a daily basis and
>>     
> am
>   
>> thinking to implement:
>>
>>
>>
>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>     
> execute
>   
>> a deleteall(rowed, timestamp)
>>
>> 2) Similar to (1), but with a scanner. I could also write a filter to
>> filter retrieve only rowids that have data older then timestamp.
>>
>>
>>
>> Before I start writing code, I would like to know if there is an
>> existing process to delete old data?
>>
>>
>>
>> Thanks
>>
>> -Yair
>>
>>
>>
>>
>>     


RE: Deleting old versions from a table

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
time to live units are milliseconds (a long)

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Yair Even-Zohar [mailto:yaire@revenuescience.com]
> Sent: Wednesday, October 15, 2008 9:11 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Deleting old versions from a table
>
> I need this feature because I'd like old data to expire after X days.
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter  (Days / Hours/ Second) ?
>
> Thanks
> -Yair
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com]
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that?  I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
> > I would like to delete old versions from a table on a daily basis and
> am
> > thinking to implement:
> >
> >
> >
> > 1) Run a map/reduce (similar to RowCounter) and for each rowid,
> execute
> > a deleteall(rowed, timestamp)
> >
> > 2) Similar to (1), but with a scanner. I could also write a filter to
> > filter retrieve only rowids that have data older then timestamp.
> >
> >
> >
> > Before I start writing code, I would like to know if there is an
> > existing process to delete old data?
> >
> >
> >
> > Thanks
> >
> > -Yair
> >
> >
> >
> >


RE: Deleting old versions from a table

Posted by Yair Even-Zohar <ya...@revenuescience.com>.
I need this feature because I'd like old data to expire after X days. 
I now see that I can use HColumnDescriptor.setTimeToLive(int
timeToLive). So, my question is what is the granularity of the
"timeToLive" parameter  (Days / Hours/ Second) ?

Thanks
-Yair 


-----Original Message-----
From: Dingding Ye [mailto:yedingding@gmail.com] 
Sent: Wednesday, October 15, 2008 10:43 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Deleting old versions from a table

Why do you want to do that?  I think limited the column family with
VERSIONS
is enough.

On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
<ya...@revenuescience.com>wrote:

> I would like to delete old versions from a table on a daily basis and
am
> thinking to implement:
>
>
>
> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
execute
> a deleteall(rowed, timestamp)
>
> 2) Similar to (1), but with a scanner. I could also write a filter to
> filter retrieve only rowids that have data older then timestamp.
>
>
>
> Before I start writing code, I would like to know if there is an
> existing process to delete old data?
>
>
>
> Thanks
>
> -Yair
>
>
>
>

Re: Deleting old versions from a table

Posted by Dingding Ye <ye...@gmail.com>.
Why do you want to do that?  I think limited the column family with VERSIONS
is enough.

On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
<ya...@revenuescience.com>wrote:

> I would like to delete old versions from a table on a daily basis and am
> thinking to implement:
>
>
>
> 1) Run a map/reduce (similar to RowCounter) and for each rowid, execute
> a deleteall(rowed, timestamp)
>
> 2) Similar to (1), but with a scanner. I could also write a filter to
> filter retrieve only rowids that have data older then timestamp.
>
>
>
> Before I start writing code, I would like to know if there is an
> existing process to delete old data?
>
>
>
> Thanks
>
> -Yair
>
>
>
>