You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2018/02/06 21:29:00 UTC
[jira] [Closed] (FLINK-8559) Exceptions in
RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck
[ https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chesnay Schepler closed FLINK-8559.
-----------------------------------
Resolution: Fixed
master: dbb81acb5a1d0f2a9521c6eef7eeb2436bb8004d
> Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck
> -------------------------------------------------------------------------------------
>
> Key: FLINK-8559
> URL: https://issues.apache.org/jira/browse/FLINK-8559
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing, Tests
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this code
>
> {code:java}
> final RocksDBIncrementalSnapshotOperation<K> snapshotOperation =
> new RocksDBIncrementalSnapshotOperation<>(
> this,
> checkpointStreamFactory,
> checkpointId,
> checkpointTimestamp);
> snapshotOperation.takeSnapshot();
> return new FutureTask<KeyedStateHandle>(
> new Callable<KeyedStateHandle>() {
> @Override
> public KeyedStateHandle call() throws Exception {
> return snapshotOperation.materializeSnapshot();
> }
> }
> ) {
> @Override
> public boolean cancel(boolean mayInterruptIfRunning) {
> snapshotOperation.stop();
> return super.cancel(mayInterruptIfRunning);
> }
> @Override
> protected void done() {
> snapshotOperation.releaseResources(isCancelled());
> }
> };
> {code}
> In the constructor of RocksDBIncrementalSnapshotOperation we call {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If {{snapshotOperation.takeSnapshot()}} fails with an exception these resources are never released. When the task is shutdown due to the exception it will get stuck on releasing RocksDB.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)