You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by "Hart, Andrew via user" <us...@accumulo.apache.org> on 2022/06/29 11:54:45 UTC

Un-referenced rfiles in hdfs

Hi,

I have some rfiles in hdfs that aren't referenced in the accumulo.metadata.
So there will be a file like   8500000000 2022-02-02 11:59 /accumulo/tables/3/t-1234567/Cabcdef.rf
but grep -t accumulo.metadata Cabcdef.rf doesn't find anything.

Is there any way run the gc process so that it cleans up the orphan rfiles?

And.


Public

Re: Un-referenced rfiles in hdfs

Posted by Christopher <ct...@apache.org>.

This question seems to be a log4j performance/behavior question. You
may have more luck at https://logging.apache.org/log4j for anything
specific to the behavior of log4j.

On Fri, Jul 8, 2022 at 2:44 AM Hart, Andrew via user
<us...@accumulo.apache.org> wrote:
>
>
>
> I am still trying to track down the cause of the un-referenced rfiles and unsplitable tablets causing hold-time tservers exits.
>
>
>
> In the logs I see (something like)
>
> 2022-07-07 12:00:00 ..splitting tablet x
>
> 2022-07-07 12:00:03 x was split size1 1000000000 size2 1000000001 time 300000ms
>
> So in the logs it looks like the split is quick but if you check the ms it is over 5 minutes.
>
> Is there a way to make the logs work better?  I thought maybe log4j.appender.immediateFlush=true…does that work and is there a big performance penalty?
>
>
>
> Public

RE: Un-referenced rfiles in hdfs

Posted by "Hart, Andrew via user" <us...@accumulo.apache.org>.

I am still trying to track down the cause of the un-referenced rfiles and unsplitable tablets causing hold-time tservers exits.

In the logs I see (something like)
2022-07-07 12:00:00 ..splitting tablet x
2022-07-07 12:00:03 x was split size1 1000000000 size2 1000000001 time 300000ms
So in the logs it looks like the split is quick but if you check the ms it is over 5 minutes.
Is there a way to make the logs work better?  I thought maybe log4j.appender.immediateFlush=true...does that work and is there a big performance penalty?


Public

RE: Un-referenced rfiles in hdfs

Posted by "Hart, Andrew via user" <us...@accumulo.apache.org>.

I found that if I split the last row off the tablet and merge it into the following tablet then it removed the orphans due to dir being added to del list.

As to why it is crashing, I think that :
If there is a very large tablet and accumulo tries to split it and there is ingest causing minor compactions then accumulo somehow loses count of the minor compactions and eventually the concurrent minor compactions is full and no more can be added.
The dashboard shows minor compactions as running but listcompactions does not show any.
At this point hold time is triggered and if the tablet cannot be split within 5 minutes the tserver exits.

I don't understand why that would orphan an rfile though.

As to the very large tablet...if you repeatedly add to the same row it can't be split but then if you add some different row that will end in the same tablet, it will then attempt to split a large tablet.  Or is there a mechanism to prevent this?

For instance, I never see a log like this "Tablet x contains a large row y, isolating it in own tablet, splitting x into x,y,z"

From: Christopher <ct...@apache.org>
Sent: 29 June 2022 13:23
To: accumulo-user <us...@accumulo.apache.org>; Hart, Andrew <an...@cgi.com>
Subject: Re: Un-referenced rfiles in hdfs

EXTERNAL SENDER: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n'ouvrez aucune pièce jointe à moins qu'ils ne proviennent d'un expéditeur fiable, ou que vous ayez l'assurance que le contenu provient d'une source sûre.


The Accumulo file garbage collection mechanism is designed to fail safe to only delete files it knows are no longer in use. It also tries to do this with minimal interaction with the hdfs name node (so, no scanning the entire file system to find files). It's possible that in some circumstances, servers can crash in a way that leaves a file on the file system that Accumulo is no longer using but Accumulo does not have evidence off its existence in order to know to clean it up. This is a preferred failure scenario than accidentally aggressively deleting files that could still be in use.

My recommendation is to periodically check your file system for such orphaned files, and determine if you wish to delete them based on their age or content. These should only appear after a server failure, so you could perform such tasks during triage/investigation of whatever failure occurred when it occurs in your system. You could also write a small trivial monitoring service to identify old unreferenced files and report them to you by whatever means you prefer. Since these should only appear after an unexpected failure, it's hard to provide a general solution within Accumulo itself.

On Wed, Jun 29, 2022, 07:54 Hart, Andrew via user <us...@accumulo.apache.org>> wrote:
Hi,

I have some rfiles in hdfs that aren't referenced in the accumulo.metadata.
So there will be a file like   8500000000 2022-02-02 11:59 /accumulo/tables/3/t-1234567/Cabcdef.rf
but grep -t accumulo.metadata Cabcdef.rf doesn't find anything.

Is there any way run the gc process so that it cleans up the orphan rfiles?

And.


Public


Public

Re: Un-referenced rfiles in hdfs

Posted by Christopher <ct...@apache.org>.

The Accumulo file garbage collection mechanism is designed to fail safe to
only delete files it knows are no longer in use. It also tries to do this
with minimal interaction with the hdfs name node (so, no scanning the
entire file system to find files). It's possible that in some
circumstances, servers can crash in a way that leaves a file on the file
system that Accumulo is no longer using but Accumulo does not have evidence
off its existence in order to know to clean it up. This is a preferred
failure scenario than accidentally aggressively deleting files that could
still be in use.

My recommendation is to periodically check your file system for such
orphaned files, and determine if you wish to delete them based on their age
or content. These should only appear after a server failure, so you could
perform such tasks during triage/investigation of whatever failure occurred
when it occurs in your system. You could also write a small trivial
monitoring service to identify old unreferenced files and report them to
you by whatever means you prefer. Since these should only appear after an
unexpected failure, it's hard to provide a general solution within Accumulo
itself.

On Wed, Jun 29, 2022, 07:54 Hart, Andrew via user <us...@accumulo.apache.org>
wrote:

> Hi,
>
>
>
> I have some rfiles in hdfs that aren’t referenced in the accumulo.metadata.
>
> So there will be a file like   8500000000 2022-02-02 11:59
> /accumulo/tables/3/t-1234567/Cabcdef.rf
>
> but grep -t accumulo.metadata Cabcdef.rf doesn’t find anything.
>
>
>
> Is there any way run the gc process so that it cleans up the orphan rfiles?
>
>
>
> And.
>
> Public
>