You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/06/20 19:46:28 UTC

[GitHub] [accumulo] ctubbsii edited a comment on issue #1227: GC process may be skipping deletion of some unreferenced files.

ctubbsii edited a comment on issue #1227: GC process may be skipping deletion of some unreferenced files.
URL: https://github.com/apache/accumulo/issues/1227#issuecomment-504157810
 
 
   This isn't a bug. This is the intended design. The original garbage collector used to crawl all of HDFS, look for referenced files, and delete everything unreferenced. This was unsafe in the case of failure (some files were too aggressively deleted prior to a reference being added), and a big burden on the name node.
   
   The new garbage collector tries to only delete things that have been explicitly identified as a candidate for deletion, and are provably safe to delete. It errs on the side of leaving things behind, rather than deleting them. Of course, this means that system administrators need to watch their clouds, especially in the case of failures, for anything left behind unreferenced, but that is an intentional trade-off.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services