You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Sonam Mandal <so...@linkedin.com> on 2021/04/01 16:38:51 UTC

Question about setting up Task-local recovery with a RocksDB state backend

Hello,

I've been going through the documentation for task-local recovery and came across this section<https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#details-on-task-local-recovery-for-different-state-backends> which discusses that with incremental checkpoints enabled the task-local recovery incurs no additional storage cost. The caveat mentioned indicates that the task local recovery state and all the rocks DB local state must be on a single physical device to allow the use of hard links. I wanted to understand how to ensure that our RocksDB local state is on the same physical device as the task-local recovery data.

I came across a couple of config options we can set to point the RocksDB local state to a directory of our choosing, along with the task local recovery directory. Do I need to set both up for task local recovery to work correctly? What are the default paths if I don't set up these configs? (we are using Kubernetes - assume that /opt/flink/local-state below corresponds to a given physical drive)


    state.backend.rocksdb.localdir: /opt/flink/local-state/rocksdblocaldir

    taskmanager.state.local.root-dirs: /opt/flink/local-state/tasklocaldir

Do these configs make any difference if we turn off incremental checkpointing for RocksDB? Also, setting up this localdir for RocksDB won't affect checkpointing and where the checkpoints are stored, right?

After setting up the above two configs, I ran into some issues where the job would just disappear (or fail) if the Task Manager pod got killed (whereas without this, the job resumed correctly from the last checkpoint after the task manager pod was killed).

Thanks,
Sonam