You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/01 16:01:21 UTC

[GitHub] [accumulo] keith-turner opened a new issue #2125: Make detection of dead external compactions more resilient to network hiccups

keith-turner opened a new issue #2125:
URL: https://github.com/apache/accumulo/issues/2125


   The code that detects dead compactions [reaches out to each compactor to ask what its running](https://github.com/apache/accumulo/blob/8a636a3ba91f5dae1d8b09b095178889a7d79c1d/server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/DeadCompactionDetector.java#L83-L84).  If there are transient network issues and a compactor can not be reached, then this could result in marking running external compactions as failed resulting in lost work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2125: Make detection of dead external compactions more resilient to network hiccups

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2125:
URL: https://github.com/apache/accumulo/issues/2125#issuecomment-852254130


   > The code that detects dead compactions reaches out to each compactor to ask what its running
   
   This only happens once, on Coordinator startup. https://github.com/apache/accumulo/blob/main/server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/CompactionCoordinator.java#L249
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2125: Make detection of dead external compactions more resilient to network hiccups

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2125:
URL: https://github.com/apache/accumulo/issues/2125#issuecomment-852247696


   One possible solution to this problem is when a possible dead compaction is detected to do the following.
   
    * Remember the possible dead compaction and its first seen and last seen time in a map in the DeadCompactionDetector.
    * When last seen - first seen > 10 minutes, then actually mark the compactions as dead.
   
   So don't mark a compaction as dead the first time a problems is seen, only do so after multiple times of seeing a problem.  Also the set of possible dead compactions should have anything cleared out that was seen as running or no longer exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2125: Make detection of dead external compactions more resilient to network hiccups

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2125:
URL: https://github.com/apache/accumulo/issues/2125#issuecomment-852285250


   > This only happens once, on Coordinator startup.
   
   That covers a different case where that thrift code is called than what I was thinking about.  The dead compaction detector [runs repeatedly](https://github.com/apache/accumulo/blob/8a636a3ba91f5dae1d8b09b095178889a7d79c1d/server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/DeadCompactionDetector.java#L113-L119).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion closed issue #2125: Make detection of dead external compactions more resilient to network hiccups

Posted by GitBox <gi...@apache.org>.
dlmarion closed issue #2125:
URL: https://github.com/apache/accumulo/issues/2125


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org