You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by z11373 <z1...@outlook.com> on 2015/06/10 19:53:37 UTC

question on AgeOffFilter

Hi,
I have questions on using AgeOffFilter. Earlier I ran following:

root@dev> createtable testiter
root@dev testiter> insert 1 cf col1 foo
root@dev testiter> scan
1 cf:col1 []    foo

Then two hours later, I ran:

root@dev testiter> setiter -ageoff -t testiter -p 15 -minc
AgeOffFilter removes entries with timestamps more than <ttl> milliseconds
old
----------> set AgeOffFilter parameter negate, default false keeps k/v that
pass accept method, true rejects k/v that pass accept method:
----------> set AgeOffFilter parameter ttl, time to live (milliseconds):
30000
----------> set AgeOffFilter parameter currentTime, if set, use the given
value as the absolute time in milliseconds as the current time of day:

root@dev testiter> scan
1 cf:col1 []    foo

root@dev testiter> flush -w
2015-06-10 17:10:12,124 [shell.Shell] INFO : Flush of table testiter
completed.

root@dev testiter> scan
1 cf:col1 []    foo

*First question*, why that key/value still exists? I'd think since I set the
TTL to 30 seconds, and that key/value has been created more than 2 hours
ago, so it should be gone after table flush (minc)?

Then later I did following:

root@dev testiter> insert 2 cf col1 bar
root@dev testiter> scan
1 cf:col1 []    foo
2 cf:col1 []    bar

Wait for more than 30 seconds, then ran:

root@dev testiter> flush -w
2015-06-10 17:16:38,903 [shell.Shell] INFO : Flush of table testiter
completed.
root@dev testiter> scan
1 cf:col1 []    foo

This is correct as the second key/value pair no longer exist, but why the
first one still there?

*Second question*, I still don't fully understand the currentTime argument.
Since I didn't specify any long value (when being prompted), I'd assume it
took current time when I set the iterator, is it true? I am not sure because
if that is the case then key/value items inserted later won't get aged off
since they will have later timestamp than that value set by iterator. This
is also not true as shown in my example above (which second item was gone).
I hope someone can enlighten me on this.

*Third question*, which is kind of related to 2nd question. If I want to
have data in a table retained for 6 months, i.e. if compaction runs
everyday, then all key/value items with timestamp six months older than that
day will be gone, how can I achieve this? I guess that AgeOffFilter is the
right way to do, but the results from #1 and #2 above are confusing me, and
think it doesn't work as I wanted to.


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/question-on-AgeOffFilter-tp14386.html
Sent from the Users mailing list archive at Nabble.com.

Re: question on AgeOffFilter

Posted by z11373 <z1...@outlook.com>.
Thanks Josh and John. Your answers make sense indeed.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/question-on-AgeOffFilter-tp14386p14391.html
Sent from the Users mailing list archive at Nabble.com.

Re: question on AgeOffFilter

Posted by John Vines <vi...@apache.org>.
First question -

You only attached the age off filter to minor compaction scope, not scan
time (nor majC time). This is why you still see it.

Second question-

Current time is more applicable for when you attach it scan time, I think.
To ensure you have a consistent view across all servers.

Third question -
Attach the iterator and scan and major compaction scopes and you should be
fine.

On Wed, Jun 10, 2015 at 1:54 PM z11373 <z1...@outlook.com> wrote:

> Hi,
> I have questions on using AgeOffFilter. Earlier I ran following:
>
> root@dev> createtable testiter
> root@dev testiter> insert 1 cf col1 foo
> root@dev testiter> scan
> 1 cf:col1 []    foo
>
> Then two hours later, I ran:
>
> root@dev testiter> setiter -ageoff -t testiter -p 15 -minc
> AgeOffFilter removes entries with timestamps more than <ttl> milliseconds
> old
> ----------> set AgeOffFilter parameter negate, default false keeps k/v that
> pass accept method, true rejects k/v that pass accept method:
> ----------> set AgeOffFilter parameter ttl, time to live (milliseconds):
> 30000
> ----------> set AgeOffFilter parameter currentTime, if set, use the given
> value as the absolute time in milliseconds as the current time of day:
>
> root@dev testiter> scan
> 1 cf:col1 []    foo
>
> root@dev testiter> flush -w
> 2015-06-10 17:10:12,124 [shell.Shell] INFO : Flush of table testiter
> completed.
>
> root@dev testiter> scan
> 1 cf:col1 []    foo
>
> *First question*, why that key/value still exists? I'd think since I set
> the
> TTL to 30 seconds, and that key/value has been created more than 2 hours
> ago, so it should be gone after table flush (minc)?
>
> Then later I did following:
>
> root@dev testiter> insert 2 cf col1 bar
> root@dev testiter> scan
> 1 cf:col1 []    foo
> 2 cf:col1 []    bar
>
> Wait for more than 30 seconds, then ran:
>
> root@dev testiter> flush -w
> 2015-06-10 17:16:38,903 [shell.Shell] INFO : Flush of table testiter
> completed.
> root@dev testiter> scan
> 1 cf:col1 []    foo
>
> This is correct as the second key/value pair no longer exist, but why the
> first one still there?
>
> *Second question*, I still don't fully understand the currentTime argument.
> Since I didn't specify any long value (when being prompted), I'd assume it
> took current time when I set the iterator, is it true? I am not sure
> because
> if that is the case then key/value items inserted later won't get aged off
> since they will have later timestamp than that value set by iterator. This
> is also not true as shown in my example above (which second item was gone).
> I hope someone can enlighten me on this.
>
> *Third question*, which is kind of related to 2nd question. If I want to
> have data in a table retained for 6 months, i.e. if compaction runs
> everyday, then all key/value items with timestamp six months older than
> that
> day will be gone, how can I achieve this? I guess that AgeOffFilter is the
> right way to do, but the results from #1 and #2 above are confusing me, and
> think it doesn't work as I wanted to.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/question-on-AgeOffFilter-tp14386.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: question on AgeOffFilter

Posted by Josh Elser <jo...@gmail.com>.
Some answers inline:

z11373 wrote:
> Hi,
> I have questions on using AgeOffFilter. Earlier I ran following:
>
> root@dev>  createtable testiter
> root@dev testiter>  insert 1 cf col1 foo
> root@dev testiter>  scan
> 1 cf:col1 []    foo
>
> Then two hours later, I ran:
>
> root@dev testiter>  setiter -ageoff -t testiter -p 15 -minc
> AgeOffFilter removes entries with timestamps more than<ttl>  milliseconds
> old
> ---------->  set AgeOffFilter parameter negate, default false keeps k/v that
> pass accept method, true rejects k/v that pass accept method:
> ---------->  set AgeOffFilter parameter ttl, time to live (milliseconds):
> 30000
> ---------->  set AgeOffFilter parameter currentTime, if set, use the given
> value as the absolute time in milliseconds as the current time of day:
>
> root@dev testiter>  scan
> 1 cf:col1 []    foo
>
> root@dev testiter>  flush -w
> 2015-06-10 17:10:12,124 [shell.Shell] INFO : Flush of table testiter
> completed.
>
> root@dev testiter>  scan
> 1 cf:col1 []    foo
>
> *First question*, why that key/value still exists? I'd think since I set the
> TTL to 30 seconds, and that key/value has been created more than 2 hours
> ago, so it should be gone after table flush (minc)?

The table might have already minc'ed before you configured the iterator. 
This would cause the ageoff to not have happened. You likely want to 
configure scan+minc+majc scope for ageoff.

> Then later I did following:
>
> root@dev testiter>  insert 2 cf col1 bar
> root@dev testiter>  scan
> 1 cf:col1 []    foo
> 2 cf:col1 []    bar
>
> Wait for more than 30 seconds, then ran:
>
> root@dev testiter>  flush -w
> 2015-06-10 17:16:38,903 [shell.Shell] INFO : Flush of table testiter
> completed.
> root@dev testiter>  scan
> 1 cf:col1 []    foo
>
> This is correct as the second key/value pair no longer exist, but why the
> first one still there?

Same reason as earlier. The key with row "2" was caught by your 
flush/minc, but the first one was not.

> *Second question*, I still don't fully understand the currentTime argument.
> Since I didn't specify any long value (when being prompted), I'd assume it
> took current time when I set the iterator, is it true? I am not sure because
> if that is the case then key/value items inserted later won't get aged off
> since they will have later timestamp than that value set by iterator. This
> is also not true as shown in my example above (which second item was gone).
> I hope someone can enlighten me on this.

Looks like the currentTime argument is there to let you 'override' what 
would normally just be System.currentTimeMillis(). You are correct that 
when you don't set this value, it uses the current time.

> *Third question*, which is kind of related to 2nd question. If I want to
> have data in a table retained for 6 months, i.e. if compaction runs
> everyday, then all key/value items with timestamp six months older than that
> day will be gone, how can I achieve this? I guess that AgeOffFilter is the
> right way to do, but the results from #1 and #2 above are confusing me, and
> think it doesn't work as I wanted to.

If things still aren't clear after the earlier explanation, please ask :)

>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/question-on-AgeOffFilter-tp14386.html
> Sent from the Users mailing list archive at Nabble.com.