You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Arvid Heise (Jira)" <ji...@apache.org> on 2020/10/08 06:15:00 UTC
[jira] [Updated] (FLINK-19385) Channel recovery may deadlock
[ https://issues.apache.org/jira/browse/FLINK-19385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvid Heise updated FLINK-19385:
--------------------------------
Parent: FLINK-19442
Issue Type: Sub-task (was: Bug)
> Channel recovery may deadlock
> -----------------------------
>
> Key: FLINK-19385
> URL: https://issues.apache.org/jira/browse/FLINK-19385
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network, Runtime / Task
> Affects Versions: 1.12.0
> Reporter: Roman Khachatryan
> Assignee: Roman Khachatryan
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Consider the following case:
> * Two IntputGates
> * Input selection is not ALL (say FIRST initially)
> * Unaligned Checkpoints ON
> * on recovery, there are "parts" of records in all channels (actually 1 is enough I think)
> What happens:
> # StreamTask initiates recovery and scedule partition request upon it's end
> # All gates and channels will receive buffers from StateReader
> # All channels of a single gate will consume those state buffers - completing that gate's StateConsumedFuture
> # InputProcessor will return NOTHING_AVAILABLE (see StreamTwoInputProcessor.getInputStatus)
> # StreamTask will suspend its default action
> # State of the 2nd gate won't be consumed - so its StateConsumedFutures won't be completed - so no partitions will be requested
> Solution: request partitions independently for each channel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)