You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2018/02/05 11:55:02 UTC

[jira] [Assigned] (FLINK-8559) Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck

     [ https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chesnay Schepler reassigned FLINK-8559:
---------------------------------------

    Assignee: Chesnay Schepler

> Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-8559
>                 URL: https://issues.apache.org/jira/browse/FLINK-8559
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing, Tests
>    Affects Versions: 1.5.0
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>            Priority: Blocker
>
> In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this code
>  
> {code:java}
> final RocksDBIncrementalSnapshotOperation<K> snapshotOperation =
> 	new RocksDBIncrementalSnapshotOperation<>(
> 		this,
> 		checkpointStreamFactory,
> 		checkpointId,
> 		checkpointTimestamp);
> snapshotOperation.takeSnapshot();
> return new FutureTask<KeyedStateHandle>(
> 	new Callable<KeyedStateHandle>() {
> 		@Override
> 		public KeyedStateHandle call() throws Exception {
> 			return snapshotOperation.materializeSnapshot();
> 		}
> 	}
> ) {
> 	@Override
> 	public boolean cancel(boolean mayInterruptIfRunning) {
> 		snapshotOperation.stop();
> 		return super.cancel(mayInterruptIfRunning);
> 	}
> 	@Override
> 	protected void done() {
> 		snapshotOperation.releaseResources(isCancelled());
> 	}
> };
> {code}
> In the constructor of RocksDBIncrementalSnapshotOperation we call {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If {{snapshotOperation.takeSnapshot()}} fails with an exception these resources are never released. When the task is shutdown due to the exception it will get stuck on releasing RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)