You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2022/10/25 05:34:21 UTC

[GitHub] [cassandra] lmtrombone opened a new pull request, #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

lmtrombone opened a new pull request, #1949:
URL: https://github.com/apache/cassandra/pull/1949

   There is some issue where the `nodetool bootstrap resume` command returns success successfully even when some error happens during the operation.  To address that, we now keep track of the errors when they occur in `BootstrapMonitor` and now throw them after signaling.  Also as part of this MR, I made a change so we don't send a `ProgressEventType.COMPLETE` event anymore when we see an error as that seems to be a bit misleading.
   
   To reproduce a scenario, I killed one of the nodes involved in the bootstrap resume process and checked if errors were thrown in the terminal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1005193092


##########
src/java/org/apache/cassandra/service/StorageService.java:
##########
@@ -2074,7 +2074,6 @@ public void onFailure(Throwable e)
                     }
                     logger.error(message, e);
                     progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.ERROR, 1, 1, message));
-                    progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.COMPLETE, 1, 1, "Resume bootstrap complete"));

Review Comment:
   The thing that worries me about this is that it will hang older versions of `nodetool` when an error occurs. They will be waiting for a `COMPLETE` that will never arrive after printing the error message. The fact that we're changing the behavior of `nodetool` itself on error in this Jira probably means that we only need to patch trunk, but it's probably not worth it to remove the `COMPLETE` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1005181203


##########
src/java/org/apache/cassandra/tools/BootstrapMonitor.java:
##########
@@ -34,6 +34,7 @@ public class BootstrapMonitor extends JMXNotificationProgressListener
     private final SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");
     private final PrintStream out;
     private final Condition condition = newOneTimeCondition();
+    private Exception error;

Review Comment:
   I think this is going to need to be `volatile`, as it's written by one thread and accessed by another.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#issuecomment-1295586988

   Committed in https://github.com/apache/cassandra/commit/8ec04361b9e098430023e4776baf1941be958475


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1007119006


##########
CHANGES.txt:
##########
@@ -68,6 +68,7 @@
  * Add new CQL function maxWritetime (CASSANDRA-17425)
  * Add guardrail for ALTER TABLE ADD / DROP / REMOVE column operations (CASSANDRA-17495)
  * Rename DisableFlag class to EnableFlag on guardrails (CASSANDRA-17544)
+ * Nodetool bootstrap resume will now return an error if the operation fails (CASSANDRA-16491)

Review Comment:
   Committer can fix this up on commit too, but make sure CHANGES bits go a the top of the appropriate section, not the bottom. (For a hint, look at the date progression of the `git blame` annotation in your IDE, etc.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] lmtrombone commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
lmtrombone commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1006468044


##########
src/java/org/apache/cassandra/tools/BootstrapMonitor.java:
##########
@@ -82,9 +83,19 @@ public void progress(String tag, ProgressEvent event)
             message = message + " (progress: " + (int)event.getProgressPercentage() + "%)";
         }
         out.println(message);
+        if (type == ProgressEventType.ERROR)
+        {
+            error = new RuntimeException(String.format("Bootstrap resume has failed with error: %s", message));
+            condition.signalAll();

Review Comment:
   @maedhroz Since I reverted my changes where I removed the `ProgressEventType.COMPLETE` event that was being sent, I assume we can just rely on the logic below (lines 91-94) where it already calls `condition.signalAll();`?   When we encounter an issue, I noticed that we always both `ProgressEventType.ERROR` and `ProgressEventType.COMPLETE` events so it seems a little redundant and unnecessary now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#issuecomment-1293963983

   https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16491-circle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1005195717


##########
src/java/org/apache/cassandra/tools/NodeProbe.java:
##########
@@ -1992,6 +1992,8 @@ public void resumeBootstrap(PrintStream out) throws IOException
             {
                 out.println("Resuming bootstrap");
                 monitor.awaitCompletion();
+                if (monitor.getError() != null)
+                    throw monitor.getError();

Review Comment:
   This is obviously going to change the return contract for `resume`, but it's really just a bug fix. I don't see it requiring a `NEWS.txt` entry. (Just `CHANGES.txt`.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maedhroz commented on a diff in pull request #1949: CASSANDRA-16491 track and handle errors during nodetool bootstrap resume properly

Posted by GitBox <gi...@apache.org>.
maedhroz commented on code in PR #1949:
URL: https://github.com/apache/cassandra/pull/1949#discussion_r1007268481


##########
src/java/org/apache/cassandra/tools/BootstrapMonitor.java:
##########
@@ -82,9 +83,19 @@ public void progress(String tag, ProgressEvent event)
             message = message + " (progress: " + (int)event.getProgressPercentage() + "%)";
         }
         out.println(message);
+        if (type == ProgressEventType.ERROR)
+        {
+            error = new RuntimeException(String.format("Bootstrap resume has failed with error: %s", message));
+            condition.signalAll();

Review Comment:
   I definitely wouldn't do it this way from scratch, but it's probably just not worth the compatibility headache. What we have now LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org