You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/05/28 14:17:00 UTC
[jira] [Created] (FLINK-18012) Deactivate slot timeout if
TaskSlotTable.tryMarkSlotActive is called
Till Rohrmann created FLINK-18012:
-------------------------------------
Summary: Deactivate slot timeout if TaskSlotTable.tryMarkSlotActive is called
Key: FLINK-18012
URL: https://issues.apache.org/jira/browse/FLINK-18012
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.10.1, 1.9.3, 1.11.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Fix For: 1.11.0, 1.10.2, 1.9.4
With FLINK-9932 we loosened the slot allocation protocol in a way that the {{JobMaster}} can deploy {{Tasks}} into a slot which has not been {{ACTIVATED}} but only {{ALLOCATED}} for a given job. This allowed to better handle the case where the {{JobMasterGateway#offerSlots}} response was late so that it timed out. The way it was solved is to offer a {{TaskSlotTable#tryMarkSlotActive}} method which, in contrast to {{TaskSlotTable#markSlotActive}}, would not fail if the requested slot was not available.
However, the problem is that the former method does not deactivate the slot timeout. Hence, it can happen if the {{offerSlots}} response never arrives at the {{TaskExecutor}} that an {{ACTIVATED}} slot times out.
In order to fix the problem, we should also deactivate the slot timeout when {{TaskSlotTable#tryMarkSlotActive}} is being called.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)