You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/09/17 21:08:30 UTC

[GitHub] [accumulo] EdColeman commented on issue #1689: Tserver in bad state may be writing corrupted files during compactions.

EdColeman commented on issue #1689:
URL: https://github.com/apache/accumulo/issues/1689#issuecomment-694500736


   (Using 1.10 code) when the tserver gets into a bad state, it looks like zooCache may be returning null in the Tables.exists() check (Tables - line 147).  
   
   In TabletServerResourceManager - line 451 has a catch throwable and just a log statement.  The code is in a continuous loop and I believe the code after the error is correctly guarded, but the loop never will end.
   
   I don't think that killing the runnable would work - the tserver might never notice it lost the memory manager thread.  
   
   I think zookeeper is available - how bad would it be if on catching the exception, it just deleted the tablet server lock and thereby killed the server?  That would be preferable to writing corrupt data, but maybe there are other "recoverable errors"?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org