You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yanfei Lei (Jira)" <ji...@apache.org> on 2023/04/26 03:51:00 UTC

[jira] [Comment Edited] (FLINK-30644) ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator being shutdown

    [ https://issues.apache.org/jira/browse/FLINK-30644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716514#comment-17716514 ] 

Yanfei Lei edited comment on FLINK-30644 at 4/26/23 3:50 AM:
-------------------------------------------------------------

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48321&view=logs&j=a549b384-c55a-52c0-c451-00e0477ab6db&t=eef5922c-08d9-5ba3-7299-8393476594e7&l=10691] stack trace shows that the reason is fileNotFound:
{code:java}
Apr 21 01:35:03 	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
Apr 21 01:35:03 	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
Apr 21 01:35:03 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
Apr 21 01:35:03 	... 3 more
Apr 21 01:35:03 Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory 'file:/tmp/junit862341719583315537/junit8040524335885911429/e0cfadc575a94b10511f5ef02629fb30/chk-1' on file system 'file'.
Apr 21 01:35:03 	at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
Apr 21 01:35:03 	at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:114)
Apr 21 01:35:03 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
Apr 21 01:35:03 	... 3 more
Apr 21 01:35:03 Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory 'file:/tmp/junit862341719583315537/junit8040524335885911429/e0cfadc575a94b10511f5ef02629fb30/chk-1' on file system 'file'.
Apr 21 01:35:03 	at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorageAccess.resolveCheckpointPointer(AbstractFsCheckpointStorageAccess.java:275)
Apr 21 01:35:03 	at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorageAccess.resolveCheckpoint(AbstractFsCheckpointStorageAccess.java:136)
Apr 21 01:35:03 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1824)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.tryRestoreExecutionGraphFromSavepoint(DefaultExecutionGraphFactory.java:223)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:198)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:365)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:210)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:136)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:152)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:119)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:371)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:348)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:123)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
Apr 21 01:35:03 	at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
Apr 21 01:35:03 	... 4 more
{code}
 

 


was (Author: yanfei lei):
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48321&view=logs&j=a549b384-c55a-52c0-c451-00e0477ab6db&t=eef5922c-08d9-5ba3-7299-8393476594e7&l=10691] stack trace shows that the reason is fileNotFound:
{code:java}
Apr 21 01:35:03 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1824)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.tryRestoreExecutionGraphFromSavepoint(DefaultExecutionGraphFactory.java:223)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:198)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:365)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:210)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:136)
Apr 21 01:35:03 	at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:152)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:119)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:371)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:348)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:123)
Apr 21 01:35:03 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
Apr 21 01:35:03 	at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
Apr 21 01:35:03 	... 4 more {code}
 

 

> ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator being shutdown
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-30644
>                 URL: https://issues.apache.org/jira/browse/FLINK-30644
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / State Backends
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Priority: Critical
>              Labels: test-stability
>
> We observe a build failure in {{ChangelogCompatibilityITCase.testRestore}} due to the {{CheckpointCoordinator}} being shut down:
> {code:java}
> [...]
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: CheckpointCoordinator shutdown.
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:544)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2140)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2127)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:2004)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:1987)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingAndQueuedCheckpoints(CheckpointCoordinator.java:2183)
> Jan 12 02:37:37 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.shutdown(CheckpointCoordinator.java:426)
> Jan 12 02:37:37 	at org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.onTerminalState(DefaultExecutionGraph.java:1329)
> [...]{code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44731&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae&l=9255



--
This message was sent by Atlassian Jira
(v8.20.10#820010)