You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Parth Sarathy <pa...@microfocus.com> on 2019/10/31 14:26:26 UTC

State restoration from checkpoint

_metadata._metadata
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1672/_metadata._metadata> 
Hi,  Recently we upgraded our application in which flink windowed
transformation is used. The earlier version of the application used flink
1.7.2 while the new version uses flink 1.8.2. While submitting new job the
application sets path to the latest checkpoint directory as restore path.
After upgrade we see below statement in job manager log which was as
expected:
2019-10-30 13:04:46.895 [flink-akka.actor.default-dispatcher-12] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Starting job
b8370f434690bd82ae413183b0826567 from savepoint
/data-processor/data/checkpoints/c7775bdc5d9f51f41576342946cf1037/chk-19089
(allowing non restored state)
But then the following error comes which prevents the new job to run:  
2019-10-30 13:05:23.500 [flink-akka.actor.default-dispatcher-12] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph  -
graph-updates-stream (1/8) (2d07dc9bf1ef6ef3e4b7fdbf32943e4e) switched from
RUNNING to FAILED.java.lang.Exception: Exception while creating
StreamOperatorStateContext.	at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:740)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:291)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at java.lang.Thread.run(Thread.java:748)
~[na:1.8.0_212]Caused by: org.apache.flink.util.FlinkException: Could not
restore keyed state backend for
WindowOperator_6d6ae3c2fa6c08ed7dd466debdde6743_(1/8) from any of the 1
provided restore options.	at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	... 5 common frames omittedCaused by:
org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected
exception.	at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:324)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
~[flink-dist_2.11-1.8.2.jar:1.8.2]	... 7 common frames omitted
*Caused by: java.io.FileNotFoundException:
/data-processor/data/checkpoints/6f8c503c32329e7ed1df3029ba1e2c74/shared/75d656bd-450e-42eb-ab40-edde993e3f12
(No such file or directory)*	at java.io.FileInputStream.open0(Native Method)
~[na:1.8.0_212]...
Why is it looking for files in some other job directory for restoring
state?I have attached metadata file from
/data-processor/data/checkpoints/c7775bdc5d9f51f41576342946cf1037/chk-19089.
It refers to multiple directories in
/data-processor/data/checkpoints/6f8c503c32329e7ed1df3029ba1e2c74/shared/.
 Please elaborate on which all checkpoint directories/files are used for
state restoration. Our requirement is to clean up all the job/checkpoint
directories which are not required. 
Thanks,
Parth Sarathy



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/