You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/02/18 06:28:00 UTC

[jira] [Created] (FLINK-26239) EventTimeWindowCheckpointingITCase.testSlidingTimeWindow failed on azure

Yun Gao created FLINK-26239:
-------------------------------

             Summary: EventTimeWindowCheckpointingITCase.testSlidingTimeWindow failed on azure
                 Key: FLINK-26239
                 URL: https://issues.apache.org/jira/browse/FLINK-26239
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.15.0
            Reporter: Yun Gao


{code:java}
2022-02-17T11:46:39.1850375Z Feb 17 11:46:39 Starting org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase#testSlidingTimeWindow[statebackend type =ROCKSDB_INCREMENTAL, buffersPerChannel = 2].
2022-02-17T11:46:39.1854584Z org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2022-02-17T11:46:39.1855470Z 	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
2022-02-17T11:46:39.1856444Z 	at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
2022-02-17T11:46:39.1857393Z 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2022-02-17T11:46:39.1858400Z 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2022-02-17T11:46:39.1865249Z 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-02-17T11:46:39.1866299Z 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2022-02-17T11:46:39.1867590Z 	at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:259)
2022-02-17T11:46:39.1868546Z 	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2022-02-17T11:46:39.1869254Z 	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2022-02-17T11:46:39.1869828Z 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-02-17T11:46:39.1870367Z 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2022-02-17T11:46:39.1871131Z 	at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
2022-02-17T11:46:39.1872123Z 	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
2022-02-17T11:46:39.1875765Z 	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
2022-02-17T11:46:39.1877055Z 	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
2022-02-17T11:46:39.1878032Z 	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2022-02-17T11:46:39.1879084Z 	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2022-02-17T11:46:39.1879697Z 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-02-17T11:46:39.1880252Z 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2022-02-17T11:46:39.1880840Z 	at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
2022-02-17T11:46:39.1881357Z 	at akka.dispatch.OnComplete.internal(Future.scala:300)
2022-02-17T11:46:39.1881788Z 	at akka.dispatch.OnComplete.internal(Future.scala:297)

...

2022-02-17T11:46:39.1915003Z Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold.
2022-02-17T11:46:39.1915653Z 	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.checkFailureAgainstCounter(CheckpointFailureManager.java:160)
2022-02-17T11:46:39.1916393Z 	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:123)
2022-02-17T11:46:39.1917125Z 	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:90)
2022-02-17T11:46:39.1917819Z 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2046)
2022-02-17T11:46:39.1918594Z 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2025)
2022-02-17T11:46:39.1919268Z 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:98)
2022-02-17T11:46:39.1920052Z 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2104)
2022-02-17T11:46:39.1920852Z 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2022-02-17T11:46:39.1921390Z 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2022-02-17T11:46:39.1922079Z 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
2022-02-17T11:46:39.1922785Z 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
2022-02-17T11:46:39.1923541Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2022-02-17T11:46:39.1924108Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2022-02-17T11:46:39.1924585Z 	at java.lang.Thread.run(Thread.java:748) {code}
From the log, the checkpoint seems to fail due to 

 
{code:java}
java.lang.IllegalStateException: Attempt to reference unknown state: f1a3e68c-3bac-4bd9-b68f-7968a1411a06-KeyGroupRange{startKeyGroup=0, endKeyGroup=1}-000058.sst
    at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.SharedStateRegistryImpl.registerReference(SharedStateRegistryImpl.java:82) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle.registerSharedStates(IncrementalRemoteKeyedStateHandle.java:317) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.SharedStateRegistryImpl.registerAll(SharedStateRegistryImpl.java:172) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.changelog.ChangelogStateBackendHandle$ChangelogStateBackendHandleImpl.registerSharedStates(ChangelogStateBackendHandle.java:124) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedState(OperatorSubtaskState.java:229) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedStates(OperatorSubtaskState.java:219) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.TaskStateSnapshot.registerSharedStates(TaskStateSnapshot.java:189) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1114) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
11:36:48,770 [jobmanager-io-thread-14] WARN  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Error while processing AcknowledgeCheckpoint message 
java.lang.IllegalStateException: Attempt to reference unknown state: e0c386e6-8fdd-4277-8f5c-d4eb942c97e0-KeyGroupRange{startKeyGroup=6, endKeyGroup=7}-000057.sst
    at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.SharedStateRegistryImpl.registerReference(SharedStateRegistryImpl.java:82) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle.registerSharedStates(IncrementalRemoteKeyedStateHandle.java:317) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.SharedStateRegistryImpl.registerAll(SharedStateRegistryImpl.java:172) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.state.changelog.ChangelogStateBackendHandle$ChangelogStateBackendHandleImpl.registerSharedStates(ChangelogStateBackendHandle.java:124) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedState(OperatorSubtaskState.java:229) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedStates(OperatorSubtaskState.java:219) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.TaskStateSnapshot.registerSharedStates(TaskStateSnapshot.java:189) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1114) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31736&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=6325



--
This message was sent by Atlassian Jira
(v8.20.1#820001)