You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Dickson, Matt MR" <ma...@defence.gov.au> on 2014/04/30 01:35:17 UTC

Identify tablets with no new data loaded [SEC=UNOFFICIAL]

UNOFFICIAL

Hi,

Is there a way to identify tablets that have had no data loaded into them for a period of time, eg 7 days?  My guess is that it this information is in the metadata table but I'm not sure how to get it.  The reason for asking is that I'd like to be able to list these tablets and force a compaction on them to ageoff old data.  Because no data is being added, the ageoff never occurs and our disk space usage continues to climb.

Thanks in advance,
Matt

RE: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by "Dickson, Matt MR" <ma...@defence.gov.au>.
UNOFFICIAL

What Accumulo events/actions, other than inserts, trigger an update on the timestamp in the !METADATA table for an rfile?  Is it just updates to the rfile, ie inserts and compactions or do reads, accumulo restarts, etc also trigger an update to the timestamp?

________________________________
From: Dickson, Matt MR [mailto:matt.dickson@defence.gov.au]
Sent: Wednesday, 30 April 2014 13:17
To: 'user@accumulo.apache.org'
Subject: RE: Identify tablets with no new data loaded [SEC=UNOFFICIAL]


UNOFFICIAL

Based on this then I can query !METADATA, to get the timestamps for each rfile in a specified table, filter it to timestamps older than a certain date and then force a compaction on those?

>From the shell I ran "scan -b 2e -c file -st" to get the timestamps for the files.  An example result from this is:
2e;aaadfdsssdf_2 file:/t-34234afafas.rf [] 312312 519,13

So a "compact -b aaadfdsssdf_2 -e aaadfdsssdf_2" would force a compact on that rfile only?



________________________________
From: David Medinets [mailto:david.medinets@gmail.com]
Sent: Wednesday, 30 April 2014 13:03
To: accumulo-user
Subject: Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Again, answering myself. I ran a major compaction after my insert but did not specify the start and end values. That's why the rfile names changed and all of the timestamps.


On Tue, Apr 29, 2014 at 10:52 PM, David Medinets <da...@gmail.com>> wrote:
Apparently using the timestamp of the !METATABLE entries won't work. I created a table with four splits:

timestamp(56) row(2;1 file:/t-000008j/A000008n.rf [] 56 false)
timestamp(58) row(2;2 file:/t-000008g/A000004f.rf [] 58 false)
timestamp(57) row(2;3 file:/t-000008h/A000008o.rf [] 57 false)
timestamp(59) row(2;4 file:/t-000008k/A000004g.rf [] 59 false)
timestamp(60) row(2< file:/default_tablet/A000004h.rf [] 60 false)

Then I just inserted into the first split. But the timestamps of all tablets changed:

timestamp(1345) row(2;1 file:/t-000008j/A00000kj.rf [] 1345 false)
timestamp(1347) row(2;2 file:/t-000008g/A00000ha.rf [] 1347 false)
timestamp(1346) row(2;3 file:/t-000008h/A00000kk.rf [] 1346 false)
timestamp(1348) row(2;4 file:/t-000008k/A00000h8.rf [] 1348 false)
timestamp(1349) row(2< file:/default_tablet/A00000h9.rf [] 1349 false)

Hmm. I just noticed that the rfiles also changed. I did not expect that.



On Tue, Apr 29, 2014 at 10:22 PM, David Medinets <da...@gmail.com>> wrote:
Wouldn't the timestamp of the !METATABLE entries for each tablet give the last time the tablet was compacted since the number of entries in each tablet is tracked?


On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <md...@mdrob.com>> wrote:

It's a bit crude, but you could look at time stamps of the files in hdfs to get the time of the last minor compact.

On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>> wrote:

UNOFFICIAL

Hi,

Is there a way to identify tablets that have had no data loaded into them for a period of time, eg 7 days?  My guess is that it this information is in the metadata table but I'm not sure how to get it.  The reason for asking is that I'd like to be able to list these tablets and force a compaction on them to ageoff old data.  Because no data is being added, the ageoff never occurs and our disk space usage continues to climb.

Thanks in advance,
Matt




RE: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by "Dickson, Matt MR" <ma...@defence.gov.au>.
UNOFFICIAL

Based on this then I can query !METADATA, to get the timestamps for each rfile in a specified table, filter it to timestamps older than a certain date and then force a compaction on those?

>From the shell I ran "scan -b 2e -c file -st" to get the timestamps for the files.  An example result from this is:
2e;aaadfdsssdf_2 file:/t-34234afafas.rf [] 312312 519,13

So a "compact -b aaadfdsssdf_2 -e aaadfdsssdf_2" would force a compact on that rfile only?



________________________________
From: David Medinets [mailto:david.medinets@gmail.com]
Sent: Wednesday, 30 April 2014 13:03
To: accumulo-user
Subject: Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Again, answering myself. I ran a major compaction after my insert but did not specify the start and end values. That's why the rfile names changed and all of the timestamps.


On Tue, Apr 29, 2014 at 10:52 PM, David Medinets <da...@gmail.com>> wrote:
Apparently using the timestamp of the !METATABLE entries won't work. I created a table with four splits:

timestamp(56) row(2;1 file:/t-000008j/A000008n.rf [] 56 false)
timestamp(58) row(2;2 file:/t-000008g/A000004f.rf [] 58 false)
timestamp(57) row(2;3 file:/t-000008h/A000008o.rf [] 57 false)
timestamp(59) row(2;4 file:/t-000008k/A000004g.rf [] 59 false)
timestamp(60) row(2< file:/default_tablet/A000004h.rf [] 60 false)

Then I just inserted into the first split. But the timestamps of all tablets changed:

timestamp(1345) row(2;1 file:/t-000008j/A00000kj.rf [] 1345 false)
timestamp(1347) row(2;2 file:/t-000008g/A00000ha.rf [] 1347 false)
timestamp(1346) row(2;3 file:/t-000008h/A00000kk.rf [] 1346 false)
timestamp(1348) row(2;4 file:/t-000008k/A00000h8.rf [] 1348 false)
timestamp(1349) row(2< file:/default_tablet/A00000h9.rf [] 1349 false)

Hmm. I just noticed that the rfiles also changed. I did not expect that.



On Tue, Apr 29, 2014 at 10:22 PM, David Medinets <da...@gmail.com>> wrote:
Wouldn't the timestamp of the !METATABLE entries for each tablet give the last time the tablet was compacted since the number of entries in each tablet is tracked?


On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <md...@mdrob.com>> wrote:

It's a bit crude, but you could look at time stamps of the files in hdfs to get the time of the last minor compact.

On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>> wrote:

UNOFFICIAL

Hi,

Is there a way to identify tablets that have had no data loaded into them for a period of time, eg 7 days?  My guess is that it this information is in the metadata table but I'm not sure how to get it.  The reason for asking is that I'd like to be able to list these tablets and force a compaction on them to ageoff old data.  Because no data is being added, the ageoff never occurs and our disk space usage continues to climb.

Thanks in advance,
Matt




Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by David Medinets <da...@gmail.com>.
Again, answering myself. I ran a major compaction after my insert but did
not specify the start and end values. That's why the rfile names changed
and all of the timestamps.


On Tue, Apr 29, 2014 at 10:52 PM, David Medinets
<da...@gmail.com>wrote:

> Apparently using the timestamp of the !METATABLE entries won't work. I
> created a table with four splits:
>
> timestamp(56) row(2;1 file:/t-000008j/A000008n.rf [] 56 false)
> timestamp(58) row(2;2 file:/t-000008g/A000004f.rf [] 58 false)
> timestamp(57) row(2;3 file:/t-000008h/A000008o.rf [] 57 false)
> timestamp(59) row(2;4 file:/t-000008k/A000004g.rf [] 59 false)
> timestamp(60) row(2< file:/default_tablet/A000004h.rf [] 60 false)
>
> Then I just inserted into the first split. But the timestamps of all
> tablets changed:
>
> timestamp(1345) row(2;1 file:/t-000008j/A00000kj.rf [] 1345 false)
> timestamp(1347) row(2;2 file:/t-000008g/A00000ha.rf [] 1347 false)
> timestamp(1346) row(2;3 file:/t-000008h/A00000kk.rf [] 1346 false)
> timestamp(1348) row(2;4 file:/t-000008k/A00000h8.rf [] 1348 false)
> timestamp(1349) row(2< file:/default_tablet/A00000h9.rf [] 1349 false)
>
> Hmm. I just noticed that the rfiles also changed. I did not expect that.
>
>
>
> On Tue, Apr 29, 2014 at 10:22 PM, David Medinets <david.medinets@gmail.com
> > wrote:
>
>> Wouldn't the timestamp of the !METATABLE entries for each tablet give the
>> last time the tablet was compacted since the number of entries in each
>> tablet is tracked?
>>
>>
>> On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <md...@mdrob.com> wrote:
>>
>>> It's a bit crude, but you could look at time stamps of the files in hdfs
>>> to get the time of the last minor compact.
>>> On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>
>>> wrote:
>>>
>>>>  *UNOFFICIAL*
>>>> Hi,
>>>>
>>>> Is there a way to identify tablets that have had no data loaded into
>>>> them for a period of time, eg 7 days?  My guess is that it this information
>>>> is in the metadata table but I'm not sure how to get it.  The reason for
>>>> asking is that I'd like to be able to list these tablets and force a
>>>> compaction on them to ageoff old data.  Because no data is being added, the
>>>> ageoff never occurs and our disk space usage continues to climb.
>>>>
>>>> Thanks in advance,
>>>> Matt
>>>>
>>>
>>
>

Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by David Medinets <da...@gmail.com>.
Apparently using the timestamp of the !METATABLE entries won't work. I
created a table with four splits:

timestamp(56) row(2;1 file:/t-000008j/A000008n.rf [] 56 false)
timestamp(58) row(2;2 file:/t-000008g/A000004f.rf [] 58 false)
timestamp(57) row(2;3 file:/t-000008h/A000008o.rf [] 57 false)
timestamp(59) row(2;4 file:/t-000008k/A000004g.rf [] 59 false)
timestamp(60) row(2< file:/default_tablet/A000004h.rf [] 60 false)

Then I just inserted into the first split. But the timestamps of all
tablets changed:

timestamp(1345) row(2;1 file:/t-000008j/A00000kj.rf [] 1345 false)
timestamp(1347) row(2;2 file:/t-000008g/A00000ha.rf [] 1347 false)
timestamp(1346) row(2;3 file:/t-000008h/A00000kk.rf [] 1346 false)
timestamp(1348) row(2;4 file:/t-000008k/A00000h8.rf [] 1348 false)
timestamp(1349) row(2< file:/default_tablet/A00000h9.rf [] 1349 false)

Hmm. I just noticed that the rfiles also changed. I did not expect that.



On Tue, Apr 29, 2014 at 10:22 PM, David Medinets
<da...@gmail.com>wrote:

> Wouldn't the timestamp of the !METATABLE entries for each tablet give the
> last time the tablet was compacted since the number of entries in each
> tablet is tracked?
>
>
> On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <md...@mdrob.com> wrote:
>
>> It's a bit crude, but you could look at time stamps of the files in hdfs
>> to get the time of the last minor compact.
>> On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>
>> wrote:
>>
>>>  *UNOFFICIAL*
>>> Hi,
>>>
>>> Is there a way to identify tablets that have had no data loaded into
>>> them for a period of time, eg 7 days?  My guess is that it this information
>>> is in the metadata table but I'm not sure how to get it.  The reason for
>>> asking is that I'd like to be able to list these tablets and force a
>>> compaction on them to ageoff old data.  Because no data is being added, the
>>> ageoff never occurs and our disk space usage continues to climb.
>>>
>>> Thanks in advance,
>>> Matt
>>>
>>
>

Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by David Medinets <da...@gmail.com>.
Wouldn't the timestamp of the !METATABLE entries for each tablet give the
last time the tablet was compacted since the number of entries in each
tablet is tracked?


On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <md...@mdrob.com> wrote:

> It's a bit crude, but you could look at time stamps of the files in hdfs
> to get the time of the last minor compact.
> On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>
> wrote:
>
>>  *UNOFFICIAL*
>> Hi,
>>
>> Is there a way to identify tablets that have had no data loaded into them
>> for a period of time, eg 7 days?  My guess is that it this information
>> is in the metadata table but I'm not sure how to get it.  The reason for
>> asking is that I'd like to be able to list these tablets and force a
>> compaction on them to ageoff old data.  Because no data is being added, the
>> ageoff never occurs and our disk space usage continues to climb.
>>
>> Thanks in advance,
>> Matt
>>
>

Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Posted by Mike Drob <md...@mdrob.com>.
It's a bit crude, but you could look at time stamps of the files in hdfs to
get the time of the last minor compact.
On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <ma...@defence.gov.au>
wrote:

>  *UNOFFICIAL*
> Hi,
>
> Is there a way to identify tablets that have had no data loaded into them
> for a period of time, eg 7 days?  My guess is that it this information
> is in the metadata table but I'm not sure how to get it.  The reason for
> asking is that I'd like to be able to list these tablets and force a
> compaction on them to ageoff old data.  Because no data is being added, the
> ageoff never occurs and our disk space usage continues to climb.
>
> Thanks in advance,
> Matt
>