You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/05/06 16:52:54 UTC

[GitHub] [accumulo] Manno15 opened a new pull request #2084: Add retry counter for log recovery with MinC

Manno15 opened a new pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084


   Potential fix for #2035. 
   
   The issue with #2035 occurs at https://github.com/apache/accumulo/blob/30ce59fd94f51b60a30ced328156f02d3223330b/server/tserver/src/main/java/org/apache/accumulo/tserver/AssignmentHandler.java#L175
   
   If an iterator is misconfigured, this line will never complete and hang in `MinorCompactor.java`. This potential fix adds a retry counter (only a small amount of retries for testing purposes), solely used when the reason for the minor compaction is recovery. The exception thrown is just a placeholder for now. 
   
   The result from this change is the exception thrown allows the assignments to get rescheduled so other tablets can be loaded in. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r632426779



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");

Review comment:
       Good idea, I will revise the logging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r632529021



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");

Review comment:
       It does seem that the warn/error messaging in `AssignmentHandler` and `MinorCompationTask` does include the key extent in the logging. I can still include it in this log message to help track it. I do worry about log overload throughout the whole process of rescheduling but I am not sure if that is an actual issue or not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#issuecomment-857774100


   After looking through more JIRA tickets, I ran into [ACCUMULO-1570](https://issues.apache.org/jira/projects/ACCUMULO/issues/ACCUMULO-1570?filter=allopenissues&orderby=cf%5B12310200%5D+ASC%2C+priority+DESC%2C+updated+DESC). It is kind of similar but the idea was to offline a tablet and have it marked as broken when it kept failing to minor compact due to user iterator. Could do something similar for recovery in this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r652239219



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
-          log.warn("MinC failed ({}) to create {} retrying ...", e.getMessage(), outputFileName, e);
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
           ProblemReports.getInstance(tabletServer.getContext()).report(
               new ProblemReport(getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {

Review comment:
       ok I see a diff now, the 2nd catch block passes an exception to log.warn().  Still wondering if the prev catch block should do the same check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r632132187



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");
+            ProblemReports.getInstance(tabletServer.getContext()).report(new ProblemReport(
+                getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+            throw new IllegalStateException(e);

Review comment:
       IllegalStateException does not feel like the right exception to me, please ignore if it you don't agree.
   
   ```suggestion
               throw new RuntimeException(e);
   ```

##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");
+            ProblemReports.getInstance(tabletServer.getContext()).report(new ProblemReport(

Review comment:
       Maybe the entire if stmt could be moved after the existing ProblemReport code so that it does not need to be repeated.

##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");

Review comment:
       Including the key extent in log messages can be invaluable for debugging.  The existing log messages around failures don't include the extent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r652239219



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
-          log.warn("MinC failed ({}) to create {} retrying ...", e.getMessage(), outputFileName, e);
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
           ProblemReports.getInstance(tabletServer.getContext()).report(
               new ProblemReport(getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {

Review comment:
       ok I see a diff now, the 2nd catch block passes an exception to log.warn() and the 1st does not.  Still wondering if the prev catch block should do the same check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#issuecomment-841152044


   > Have you been able to test this change?
   
   I have tested this on a small cluster. It did allow the other tablets to be loaded while rescheduling the recovering of the tablet with the misconfigured iterator. My main concern was handling any possible edge case or any scenario that could disrupt normal recovery on a larger cluster. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r652234916



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
-          log.warn("MinC failed ({}) to create {} retrying ...", e.getMessage(), outputFileName, e);
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
           ProblemReports.getInstance(tabletServer.getContext()).report(
               new ProblemReport(getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {

Review comment:
       Should the same thing be done in the preceding catch block?  Also before this change, it seems like this catch block and the preceding catch block did the exact same thing.  Curios if you see any diffs @Manno15 w/ the previous catch block.  If they were the same and we want to do this check in both, then maybe the two catch blocks could be collapsed into one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r652234916



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
-          log.warn("MinC failed ({}) to create {} retrying ...", e.getMessage(), outputFileName, e);
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
           ProblemReports.getInstance(tabletServer.getContext()).report(
               new ProblemReport(getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {

Review comment:
       Should the same thing be done in the preceding catch block?  Also before this change, it seems like this catch block and the preceding catch block did the exact same thing.  Curios if you see any diffs @Manno15 w/ the previous catch block.  If they were the same and we want to do this check in both, then maybe the two catch blocks could be collapsed into one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r632137724



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");
+            ProblemReports.getInstance(tabletServer.getContext()).report(new ProblemReport(
+                getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+            throw new IllegalStateException(e);

Review comment:
       A more specific exception is usually preferable to a generic "RuntimeException". However, the more specific exception should actually make sense for the situation. Exceeding the max retries here does seem to be a *bad* state, but perhaps not an *illegal* one. I don't have a better suggestion... just thinking "out loud".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #2084: Add retry counter for log recovery with MinC

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #2084:
URL: https://github.com/apache/accumulo/pull/2084#discussion_r632426384



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -131,12 +134,21 @@ public CompactionStats call() {
           reportedProblem = true;
         } catch (RuntimeException | NoClassDefFoundError e) {
           // if this is coming from a user iterator, it is possible that the user could change the
-          // iterator config and that the
-          // minor compaction would succeed
+          // iterator config and that the minor compaction would succeed
+          // If the minor compaction stalls for too long during recovery, it can interfere with
+          // other tables loading
+          // Throw exception if this happens so assignments can be rescheduled.
+          if (retryCounter >= 4 && mincReason.equals(MinorCompactionReason.RECOVERY)) {
+            log.warn("Minc is stuck for too long during recovery, throwing error for reschedule.");
+            ProblemReports.getInstance(tabletServer.getContext()).report(new ProblemReport(
+                getExtent().tableId(), ProblemType.FILE_WRITE, outputFileName, e));
+            throw new IllegalStateException(e);

Review comment:
       I used IllegalStateException just as a placeholder since I saw it elsewhere in that file. That, along with the number of retries, needs to be discussed on what should be best. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org