You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/21 13:09:13 UTC

[GitHub] [flink] rkhachatryan commented on a change in pull request #12269: [FLINK-17351] [runtime] Increase `continuousFailureCounter` in `CheckpointFailureManager` for CHECKPOINT_EXPIRED

rkhachatryan commented on a change in pull request #12269:
URL: https://github.com/apache/flink/pull/12269#discussion_r428630224



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -262,6 +251,40 @@ public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingT
 		}
 	}
 
+	@Test
+	public void testExpiredCheckpointExceedsTolerableFailureNumber() {
+		// create some mock Execution vertices that receive the checkpoint trigger messages
+		ExecutionVertex vertex1 = mockExecutionVertex(new ExecutionAttemptID());
+		ExecutionVertex vertex2 = mockExecutionVertex(new ExecutionAttemptID());
+
+		final String errorMsg = "Exceeded checkpoint failure tolerance number!";
+		CheckpointFailureManager checkpointFailureManager = getCheckpointFailureManager(errorMsg);
+		CheckpointCoordinator coord = getCheckpointCoordinator(new JobID(), vertex1, vertex2, checkpointFailureManager);
+
+		try {
+			// trigger the checkpoint. this should succeed
+			final CompletableFuture<CompletedCheckpoint> checkPointFuture = coord.triggerCheckpoint(false);
+			manuallyTriggeredScheduledExecutor.triggerAll();
+			assertFalse(checkPointFuture.isCompletedExceptionally());
+
+			coord.abortPendingCheckpoints(new CheckpointException(CHECKPOINT_EXPIRED));
+
+			fail("Test failed.");
+		}
+		catch (Exception e) {
+			//expected
+			assertTrue(e instanceof RuntimeException);
+			assertEquals(errorMsg, e.getMessage());

Review comment:
       I think using `org.junit.Test#expected` (and a specific exception class) would be more expressive and less verbose here.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -262,6 +251,40 @@ public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingT
 		}
 	}
 
+	@Test
+	public void testExpiredCheckpointExceedsTolerableFailureNumber() {
+		// create some mock Execution vertices that receive the checkpoint trigger messages
+		ExecutionVertex vertex1 = mockExecutionVertex(new ExecutionAttemptID());
+		ExecutionVertex vertex2 = mockExecutionVertex(new ExecutionAttemptID());
+
+		final String errorMsg = "Exceeded checkpoint failure tolerance number!";
+		CheckpointFailureManager checkpointFailureManager = getCheckpointFailureManager(errorMsg);
+		CheckpointCoordinator coord = getCheckpointCoordinator(new JobID(), vertex1, vertex2, checkpointFailureManager);
+
+		try {
+			// trigger the checkpoint. this should succeed

Review comment:
       nit: this comment basically repeats what the code does, I think it's unnecessary

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -262,6 +251,40 @@ public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingT
 		}
 	}
 
+	@Test
+	public void testExpiredCheckpointExceedsTolerableFailureNumber() {
+		// create some mock Execution vertices that receive the checkpoint trigger messages
+		ExecutionVertex vertex1 = mockExecutionVertex(new ExecutionAttemptID());
+		ExecutionVertex vertex2 = mockExecutionVertex(new ExecutionAttemptID());
+
+		final String errorMsg = "Exceeded checkpoint failure tolerance number!";
+		CheckpointFailureManager checkpointFailureManager = getCheckpointFailureManager(errorMsg);
+		CheckpointCoordinator coord = getCheckpointCoordinator(new JobID(), vertex1, vertex2, checkpointFailureManager);
+
+		try {
+			// trigger the checkpoint. this should succeed
+			final CompletableFuture<CompletedCheckpoint> checkPointFuture = coord.triggerCheckpoint(false);
+			manuallyTriggeredScheduledExecutor.triggerAll();
+			assertFalse(checkPointFuture.isCompletedExceptionally());
+
+			coord.abortPendingCheckpoints(new CheckpointException(CHECKPOINT_EXPIRED));
+
+			fail("Test failed.");
+		}
+		catch (Exception e) {
+			//expected
+			assertTrue(e instanceof RuntimeException);
+			assertEquals(errorMsg, e.getMessage());
+		} finally {
+			try {
+				coord.shutdown(JobStatus.FINISHED);
+			} catch (Exception e) {
+				e.printStackTrace();
+				fail(e.getMessage());

Review comment:
       Why do we need to handle this error? Won't it's stacktrace be printed and test fail anyways?

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -262,6 +251,40 @@ public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingT
 		}
 	}
 
+	@Test
+	public void testExpiredCheckpointExceedsTolerableFailureNumber() {
+		// create some mock Execution vertices that receive the checkpoint trigger messages
+		ExecutionVertex vertex1 = mockExecutionVertex(new ExecutionAttemptID());
+		ExecutionVertex vertex2 = mockExecutionVertex(new ExecutionAttemptID());
+
+		final String errorMsg = "Exceeded checkpoint failure tolerance number!";
+		CheckpointFailureManager checkpointFailureManager = getCheckpointFailureManager(errorMsg);
+		CheckpointCoordinator coord = getCheckpointCoordinator(new JobID(), vertex1, vertex2, checkpointFailureManager);
+
+		try {
+			// trigger the checkpoint. this should succeed
+			final CompletableFuture<CompletedCheckpoint> checkPointFuture = coord.triggerCheckpoint(false);
+			manuallyTriggeredScheduledExecutor.triggerAll();
+			assertFalse(checkPointFuture.isCompletedExceptionally());

Review comment:
       I think this check is not necessary here because triggering should be tested separately
   (and the next check should fail anyways if trigger failed)

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -2292,6 +2315,22 @@ private CheckpointCoordinator getCheckpointCoordinator() {
 			.build();
 	}
 
+	private CheckpointFailureManager getCheckpointFailureManager(String errorMsg) {

Review comment:
       :+1: for extracting shared code




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org