You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Lu Qin <lu...@gmail.com> on 2015/06/06 07:13:08 UTC

why not check TTL interval

I have a big table about 38B entries, and I set a ageoff iterator with a ttl about 3 days,I set the iteratorPriority is 10 and apply it in all-scopes.

I stop write data into it about one week,and now I scan it ,but it wait so long. I check the monitor page,it show me that the scan speed is 80w entries/s.

I think the ageoff is a diferent iterator than others,if all data is out the ttl,when I scan the table,it will scan all data in the table and decide to remove it,right? Why not do this interval ?

Thanks

Re: why not check TTL interval

Posted by Keith Turner <ke...@deenlo.com>.
I was thinking about how to generalize this in 1266[1]

[1]: https://issues.apache.org/jira/browse/ACCUMULO-1266

Sent from phone. Please excuse typos and brevity.
On Jun 6, 2015 1:50 AM, "Lu Qin" <lu...@gmail.com> wrote:

> Accumulo do minor-compaction and major-compaction depends on a thresold
> value ,why not do age-off interval-automatic by ttl value.
> If I do it use crontab,when I add a new table ,I must update my crontab
>
>
> > 在 2015年6月6日,13:37,Josh Elser <jo...@gmail.com> 写道:
> >
> > The decrease in performance you see is probably because the iterator
> must read a significant amount of old data. If you don't write new data to
> a table, Accumulo will not run any compactions and no data will age-off in
> the files on HDFS.
> >
> > I think it would be fairly common to use crontab to regularly schedule
> compactions over your table so that data is automatically deleted (e.g.
> nightly). Accumulo doesn't contain any means to automate this internally.
> >
> > Lu Qin wrote:
> >> I have a big table about 38B entries, and I set a ageoff iterator with
> a ttl about 3 days,I set the iteratorPriority is 10 and apply it in
> all-scopes.
> >>
> >> I stop write data into it about one week,and now I scan it ,but it wait
> so long. I check the monitor page,it show me that the scan speed is 80w
> entries/s.
> >>
> >> I think the ageoff is a diferent iterator than others,if all data is
> out the ttl,when I scan the table,it will scan all data in the table and
> decide to remove it,right? Why not do this interval ?
> >>
> >> Thanks
>
>

Re: why not check TTL interval

Posted by Josh Elser <jo...@gmail.com>.
True, there would be a manual step required if you had new tables you
were adding (your original message said you only had one).

I'm not sure what you mean by interval-automatic by ttl value.
Compaction is the only operation which will trigger the iterators and
remove data past the TTL. Thus, the most logical feature to consider
adding would be scheduled compactions. You could then configure
compactions to run on certain intervals for each table. I think this
would be a good feature.

The only notion of automatic-compactions we have now (that I'm aware
of) is relative to mutations being written to a table
(table.compaction.major.everything.idle).

On Sat, Jun 6, 2015 at 1:49 AM, Lu Qin <lu...@gmail.com> wrote:
> Accumulo do minor-compaction and major-compaction depends on a thresold value ,why not do age-off interval-automatic by ttl value.
> If I do it use crontab,when I add a new table ,I must update my crontab
>
>
>> 在 2015年6月6日,13:37,Josh Elser <jo...@gmail.com> 写道:
>>
>> The decrease in performance you see is probably because the iterator must read a significant amount of old data. If you don't write new data to a table, Accumulo will not run any compactions and no data will age-off in the files on HDFS.
>>
>> I think it would be fairly common to use crontab to regularly schedule compactions over your table so that data is automatically deleted (e.g. nightly). Accumulo doesn't contain any means to automate this internally.
>>
>> Lu Qin wrote:
>>> I have a big table about 38B entries, and I set a ageoff iterator with a ttl about 3 days,I set the iteratorPriority is 10 and apply it in all-scopes.
>>>
>>> I stop write data into it about one week,and now I scan it ,but it wait so long. I check the monitor page,it show me that the scan speed is 80w entries/s.
>>>
>>> I think the ageoff is a diferent iterator than others,if all data is out the ttl,when I scan the table,it will scan all data in the table and decide to remove it,right? Why not do this interval ?
>>>
>>> Thanks
>

Re: why not check TTL interval

Posted by Lu Qin <lu...@gmail.com>.
Accumulo do minor-compaction and major-compaction depends on a thresold value ,why not do age-off interval-automatic by ttl value.
If I do it use crontab,when I add a new table ,I must update my crontab


> 在 2015年6月6日,13:37,Josh Elser <jo...@gmail.com> 写道:
> 
> The decrease in performance you see is probably because the iterator must read a significant amount of old data. If you don't write new data to a table, Accumulo will not run any compactions and no data will age-off in the files on HDFS.
> 
> I think it would be fairly common to use crontab to regularly schedule compactions over your table so that data is automatically deleted (e.g. nightly). Accumulo doesn't contain any means to automate this internally.
> 
> Lu Qin wrote:
>> I have a big table about 38B entries, and I set a ageoff iterator with a ttl about 3 days,I set the iteratorPriority is 10 and apply it in all-scopes.
>> 
>> I stop write data into it about one week,and now I scan it ,but it wait so long. I check the monitor page,it show me that the scan speed is 80w entries/s.
>> 
>> I think the ageoff is a diferent iterator than others,if all data is out the ttl,when I scan the table,it will scan all data in the table and decide to remove it,right? Why not do this interval ?
>> 
>> Thanks


Re: why not check TTL interval

Posted by Josh Elser <jo...@gmail.com>.
The decrease in performance you see is probably because the iterator 
must read a significant amount of old data. If you don't write new data 
to a table, Accumulo will not run any compactions and no data will 
age-off in the files on HDFS.

I think it would be fairly common to use crontab to regularly schedule 
compactions over your table so that data is automatically deleted (e.g. 
nightly). Accumulo doesn't contain any means to automate this internally.

Lu Qin wrote:
> I have a big table about 38B entries, and I set a ageoff iterator with a ttl about 3 days,I set the iteratorPriority is 10 and apply it in all-scopes.
>
> I stop write data into it about one week,and now I scan it ,but it wait so long. I check the monitor page,it show me that the scan speed is 80w entries/s.
>
> I think the ageoff is a diferent iterator than others,if all data is out the ttl,when I scan the table,it will scan all data in the table and decide to remove it,right? Why not do this interval ?
>
> Thanks