You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/09/27 17:25:10 UTC

[GitHub] [accumulo] keith-turner opened a new issue #1377: Add table ID sanity checks to garbage collector

keith-turner opened a new issue #1377: Add table ID sanity checks to garbage collector
URL: https://github.com/apache/accumulo/issues/1377
 
 
   Currently the Accumulo GC checks that each user table seen in the metadata table is properly formed (this check was recently improved by #1266). However there is no check to ensure all expected user tables are seen in the metadata table.  So if there is an error and nothing is seen for a user table in the metadata table, then the Accumulo GC will not know there is a problem.
   
   The garbage collection algorithm reads a set of delete candidates into memory and then scans the metadata table to remove any candidates that a referenced.  Sanity checks could added to cross reference tables ids seen in the metadata table with zookeeper.
   
   One possible way to do this is with the following three sets : 
   
    * **BSTI** : Table ids in zookeeper before the scan
    * **UMTI** : Table ids seen while scanning metadata table
    * **ASTI** : Table ids in zookeeper after the scan
    
   If (BSTI &cap; ASTI) &sube; UMTI  is true then all expected table ids were seen.  If its not true, then its not safe to delete files.  Building these sets and checking them in the GC before deleting could make the Accumulo GC more robust against unknown errors when scanning the metadata table.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services