You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yair Even-Zohar <ya...@revenuescience.com> on 2008/10/15 17:24:52 UTC
Deleting old versions from a table
I would like to delete old versions from a table on a daily basis and am
thinking to implement:
1) Run a map/reduce (similar to RowCounter) and for each rowid, execute
a deleteall(rowed, timestamp)
2) Similar to (1), but with a scanner. I could also write a filter to
filter retrieve only rowids that have data older then timestamp.
Before I start writing code, I would like to know if there is an
existing process to delete old data?
Thanks
-Yair
Re: Deleting old versions from a table
Posted by stack <st...@duboce.net>.
Thanks for the feedback. I added clarification to TRUNK (I didn't
change names of methods; just updated javadoc and added comments around
its use in HStore); hbase-929.
St.Ack
Yair Even-Zohar wrote:
> Seems like you and Jim Kellerman are both correct (his reply is that it
> is in millisec)
>
> The timeTotLive (ttl) is a long in the HStore and is represented in
> millisecs in HStore but I also found in the code
>
> if (ttl != HConstants.FOREVER)
> this.ttl *= 1000;
>
> so I could only assume the parameter is passed to the HColumnDescriptor
> in seconds.
>
> Please notice that " int getTimeToLive()" returns an int and not
> long. Also, "setTimeToLive(int timeToLive) ".
>
>
> As far as documentation, I pretty much looked everywhere and couldn't
> find any reference to the granularity. It would have been sufficient if
> the parameter name was changed. That is, instead of:
> setTimeToLive(int timeToLive)
> use
> setTimeToLive(int timeToLiveInSec)
>
> Additionally, adding this data as a comment for the setter /getter at
> HColumnDescriptor would be sufficient as it will be reflected in the api
> docs.
>
> Thanks
> -Yair
> -----Original Message-----
> From: stack [mailto:stack@duboce.net]
> Sent: Wednesday, October 15, 2008 11:21 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Looks like ttl is in seconds (See head of the HStore file).
>
> Do you have suggestion as to where we should document this (Where did
> you try looking?).
>
> Thanks,
> St.Ack
>
> Yair Even-Zohar wrote:
>
>> I need this feature because I'd like old data to expire after X days.
>> I now see that I can use HColumnDescriptor.setTimeToLive(int
>> timeToLive). So, my question is what is the granularity of the
>> "timeToLive" parameter (Days / Hours/ Second) ?
>>
>> Thanks
>> -Yair
>>
>>
>> -----Original Message-----
>> From: Dingding Ye [mailto:yedingding@gmail.com]
>> Sent: Wednesday, October 15, 2008 10:43 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Deleting old versions from a table
>>
>> Why do you want to do that? I think limited the column family with
>> VERSIONS
>> is enough.
>>
>> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
>> <ya...@revenuescience.com>wrote:
>>
>>
>>
>>> I would like to delete old versions from a table on a daily basis and
>>>
>>>
>> am
>>
>>
>>> thinking to implement:
>>>
>>>
>>>
>>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>>
>>>
>> execute
>>
>>
>>> a deleteall(rowed, timestamp)
>>>
>>> 2) Similar to (1), but with a scanner. I could also write a filter to
>>> filter retrieve only rowids that have data older then timestamp.
>>>
>>>
>>>
>>> Before I start writing code, I would like to know if there is an
>>> existing process to delete old data?
>>>
>>>
>>>
>>> Thanks
>>>
>>> -Yair
>>>
>>>
>>>
>>>
>>>
>>>
>
>
RE: Deleting old versions from a table
Posted by Yair Even-Zohar <ya...@revenuescience.com>.
Seems like you and Jim Kellerman are both correct (his reply is that it
is in millisec)
The timeTotLive (ttl) is a long in the HStore and is represented in
millisecs in HStore but I also found in the code
if (ttl != HConstants.FOREVER)
this.ttl *= 1000;
so I could only assume the parameter is passed to the HColumnDescriptor
in seconds.
Please notice that " int getTimeToLive()" returns an int and not
long. Also, "setTimeToLive(int timeToLive) ".
As far as documentation, I pretty much looked everywhere and couldn't
find any reference to the granularity. It would have been sufficient if
the parameter name was changed. That is, instead of:
setTimeToLive(int timeToLive)
use
setTimeToLive(int timeToLiveInSec)
Additionally, adding this data as a comment for the setter /getter at
HColumnDescriptor would be sufficient as it will be reflected in the api
docs.
Thanks
-Yair
-----Original Message-----
From: stack [mailto:stack@duboce.net]
Sent: Wednesday, October 15, 2008 11:21 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Deleting old versions from a table
Looks like ttl is in seconds (See head of the HStore file).
Do you have suggestion as to where we should document this (Where did
you try looking?).
Thanks,
St.Ack
Yair Even-Zohar wrote:
> I need this feature because I'd like old data to expire after X days.
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter (Days / Hours/ Second) ?
>
> Thanks
> -Yair
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com]
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that? I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
>
>> I would like to delete old versions from a table on a daily basis and
>>
> am
>
>> thinking to implement:
>>
>>
>>
>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>
> execute
>
>> a deleteall(rowed, timestamp)
>>
>> 2) Similar to (1), but with a scanner. I could also write a filter to
>> filter retrieve only rowids that have data older then timestamp.
>>
>>
>>
>> Before I start writing code, I would like to know if there is an
>> existing process to delete old data?
>>
>>
>>
>> Thanks
>>
>> -Yair
>>
>>
>>
>>
>>
Re: Deleting old versions from a table
Posted by stack <st...@duboce.net>.
Looks like ttl is in seconds (See head of the HStore file).
Do you have suggestion as to where we should document this (Where did
you try looking?).
Thanks,
St.Ack
Yair Even-Zohar wrote:
> I need this feature because I'd like old data to expire after X days.
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter (Days / Hours/ Second) ?
>
> Thanks
> -Yair
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com]
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that? I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
>
>> I would like to delete old versions from a table on a daily basis and
>>
> am
>
>> thinking to implement:
>>
>>
>>
>> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
>>
> execute
>
>> a deleteall(rowed, timestamp)
>>
>> 2) Similar to (1), but with a scanner. I could also write a filter to
>> filter retrieve only rowids that have data older then timestamp.
>>
>>
>>
>> Before I start writing code, I would like to know if there is an
>> existing process to delete old data?
>>
>>
>>
>> Thanks
>>
>> -Yair
>>
>>
>>
>>
>>
RE: Deleting old versions from a table
Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
time to live units are milliseconds (a long)
---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> -----Original Message-----
> From: Yair Even-Zohar [mailto:yaire@revenuescience.com]
> Sent: Wednesday, October 15, 2008 9:11 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Deleting old versions from a table
>
> I need this feature because I'd like old data to expire after X days.
> I now see that I can use HColumnDescriptor.setTimeToLive(int
> timeToLive). So, my question is what is the granularity of the
> "timeToLive" parameter (Days / Hours/ Second) ?
>
> Thanks
> -Yair
>
>
> -----Original Message-----
> From: Dingding Ye [mailto:yedingding@gmail.com]
> Sent: Wednesday, October 15, 2008 10:43 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Deleting old versions from a table
>
> Why do you want to do that? I think limited the column family with
> VERSIONS
> is enough.
>
> On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
> <ya...@revenuescience.com>wrote:
>
> > I would like to delete old versions from a table on a daily basis and
> am
> > thinking to implement:
> >
> >
> >
> > 1) Run a map/reduce (similar to RowCounter) and for each rowid,
> execute
> > a deleteall(rowed, timestamp)
> >
> > 2) Similar to (1), but with a scanner. I could also write a filter to
> > filter retrieve only rowids that have data older then timestamp.
> >
> >
> >
> > Before I start writing code, I would like to know if there is an
> > existing process to delete old data?
> >
> >
> >
> > Thanks
> >
> > -Yair
> >
> >
> >
> >
RE: Deleting old versions from a table
Posted by Yair Even-Zohar <ya...@revenuescience.com>.
I need this feature because I'd like old data to expire after X days.
I now see that I can use HColumnDescriptor.setTimeToLive(int
timeToLive). So, my question is what is the granularity of the
"timeToLive" parameter (Days / Hours/ Second) ?
Thanks
-Yair
-----Original Message-----
From: Dingding Ye [mailto:yedingding@gmail.com]
Sent: Wednesday, October 15, 2008 10:43 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Deleting old versions from a table
Why do you want to do that? I think limited the column family with
VERSIONS
is enough.
On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
<ya...@revenuescience.com>wrote:
> I would like to delete old versions from a table on a daily basis and
am
> thinking to implement:
>
>
>
> 1) Run a map/reduce (similar to RowCounter) and for each rowid,
execute
> a deleteall(rowed, timestamp)
>
> 2) Similar to (1), but with a scanner. I could also write a filter to
> filter retrieve only rowids that have data older then timestamp.
>
>
>
> Before I start writing code, I would like to know if there is an
> existing process to delete old data?
>
>
>
> Thanks
>
> -Yair
>
>
>
>
Re: Deleting old versions from a table
Posted by Dingding Ye <ye...@gmail.com>.
Why do you want to do that? I think limited the column family with VERSIONS
is enough.
On Wed, Oct 15, 2008 at 11:24 PM, Yair Even-Zohar
<ya...@revenuescience.com>wrote:
> I would like to delete old versions from a table on a daily basis and am
> thinking to implement:
>
>
>
> 1) Run a map/reduce (similar to RowCounter) and for each rowid, execute
> a deleteall(rowed, timestamp)
>
> 2) Similar to (1), but with a scanner. I could also write a filter to
> filter retrieve only rowids that have data older then timestamp.
>
>
>
> Before I start writing code, I would like to know if there is an
> existing process to delete old data?
>
>
>
> Thanks
>
> -Yair
>
>
>
>