You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Abdullah alkhawatrah (Jira)" <ji...@apache.org> on 2022/04/07 13:00:00 UTC
[jira] [Created] (FLINK-27127) Local recovery on task manager process restart

Abdullah alkhawatrah created FLINK-27127:
--------------------------------------------

             Summary: Local recovery on task manager process restart
                 Key: FLINK-27127
                 URL: https://issues.apache.org/jira/browse/FLINK-27127
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.15.0
            Reporter: Abdullah alkhawatrah


Hey,

I am experimenting with the support of local recovery after process restart introduced in 1.15. I am trying this on minikube.

So far, it seems that every time a pod restarts, remote recovery is triggered.

I have created a repo with everything needed to test it locally with minikube: [https://github.com/akhawatrahTW/flink-local-recovery-test.]

The readme contains the steps to reproduce.

 

Based on the documentation, I was expecting to have local recovery triggered on pod restarts since the needed configs are set: [https://github.com/akhawatrahTW/flink-local-recovery-test/blob/bfef14e45f475ba953a05b50b8829d9d33bdcec6/k8s/flink-configuration-configmap.yaml#L27.]

So was expecting to see something similar to this in the logs of the recreated task manager pod:

 
{code:java}
2022-04-07 09:17:17,637 INFO  org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation [] - Starting to restore from state handle: IncrementalLocalKeyedStateHandle{metaDataState=File State: file:/pv/tm_flink-taskmanager-2/localState/aid_e56a834e076a6d8f9dc1a2997e97a91a/jid_f88542b420546fadbc94db66b00cb5a0/vtx_20ba6b65f97481d5570070de90e4e791_sti_2/chk_1208/c2756339-8938-4949-84ff-d7ee3f4c55cf [479 bytes]} DirectoryKeyedStateHandle{directoryStateHandle=DirectoryStateHandle{directory=/pv/tm_flink-taskmanager-2/localState/aid_e56a834e076a6d8f9dc1a2997e97a91a/jid_f88542b420546fadbc94db66b00cb5a0/vtx_20ba6b65f97481d5570070de90e4e791_sti_2/chk_1208/5455302ce9554a1f81365aee368f267e}, keyGroupRange=KeyGroupRange{startKeyGroup=86, endKeyGroup=127}} without rescaling.{code}
 

 

But for some reason, remote recovery it triggered:

*Actual*
{code:java}
2022-04-07 09:17:18,405 INFO  org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation [] - Finished restoring from state handle: IncrementalRemoteKeyedStateHandle{backendIdentifier=544f3300-36bd-40a6-9ee3-f78b0e47dfd6, stateHandleId=c2753d01-2f6b-49f0-9ca1-df6b54c61490, keyGroupRange=KeyGroupRange{startKeyGroup=0, endKeyGroup=42}, checkpointId=1208, sharedState={001526.sst=ByteStreamStateHandle{handleName='f5a113d0-8094-40e7-a1b1-adc4cfc690c2', dataBytes=23107}, 001527.sst=ByteStreamStateHandle{handleName='3806411e-8213-406a-bbd8-e498ab19d118', dataBytes=15579}, 001528.sst=ByteStreamStateHandle{handleName='4fef6ead-1522-4f61-a6ad-399b334b41ca', dataBytes=15839}, 001529.sst=ByteStreamStateHandle{handleName='f1324a0c-3eae-46b0-acc2-c03d32b0c24a', dataBytes=16055}}, privateState={OPTIONS-001237=ByteStreamStateHandle{handleName='2e36d07b-5f91-4c9d-9778-5a16bb6254d5', dataBytes=9924}, MANIFEST-001234=ByteStreamStateHandle{handleName='4c95b38a-4afa-4154-9c89-9518d6384a25', dataBytes=27356}, CURRENT=ByteStreamStateHandle{handleName='17bd5bab-c369-470a-bf29-e76279cef2ba', dataBytes=16}}, metaStateHandle=ByteStreamStateHandle{handleName='15827f44-0ab2-4562-b8eb-812b8d260206', dataBytes=479}, registered=false} without rescaling.{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)