You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/07/22 20:01:00 UTC
[jira] [Commented] (FLINK-9911) SlotPool#failAllocation is called
outside of main thread
[ https://issues.apache.org/jira/browse/FLINK-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552139#comment-16552139 ]
ASF GitHub Bot commented on FLINK-9911:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/6386
[FLINK-9911][JM] Use SlotPoolGateway to call failAllocation
## What is the purpose of the change
Since the SlotPool is an actor, we must use the SlotPoolGateway to interact with
the SlotPool. Otherwise, we might risk an inconsistent state since there are
multiple threads modifying the component.
This PR is based on #6385.
## Verifying this change
- Trivial change
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixFailAllocation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6386.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6386
----
commit f85ec37cc3ad21998eabad45a6dcb46e8efc62fb
Author: Till Rohrmann <tr...@...>
Date: 2018-07-19T11:07:44Z
[FLINK-9838][logging] Don't log slot request failures on the ResourceManager
commit 7c703fb3b350ef5b02b01d621c3a16d4bca6f707
Author: Till Rohrmann <tr...@...>
Date: 2018-07-19T11:41:03Z
[hotfix] Improve logging of SlotPool and SlotSharingManager
commit 414a8d231a5b6cdc2d5db0c1d35a79ff584c1cd0
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:05:05Z
[FLINK-9908][scheduling] Do not cancel individual scheduling future
Since the individual scheduling futures contain logic to release the slot if it cannot
be assigned to the Execution, we must not cancel them. Otherwise we might risk that
slots are not returned to the SlotPool leaving it in an inconsistent state.
commit 8f4471339db3a2df01c1cc61e03eb0881f98dd4f
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:17:11Z
[FLINK-9909][core] ConjunctFuture does not cancel input futures
If a ConjunctFuture is cancelled, then it won't cancel all of its input
futures automatically. If the users needs this behaviour then he has to
implement it explicitly. The reason for this change is that an implicit
cancellation can have unwanted side effects, because all of the cancelled
input futures' producers won't be executed.
commit c606145182c0531a8239decdc52ceeccdb81ca73
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:20:53Z
[hotfix] Fix checkstyle violations in FutureUtils
commit c296d8b146cd08367329226b9ecaa28bd86ba1ed
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:34:33Z
[hotfix] Replace check state condition in Execution#tryAssignResource with if check
Instead of risking an IllegalStateException it is better to check that the
taskManagerLocationFuture has not been completed yet. If, then we also reject
the assignment of the LogicalSlot to the Execution. That way, we don't risk
that we don't release the slot in case of an exception in
Execution#allocateAndAssignSlotForExecution.
commit 69b8c7c7b5905be83c7c393423c064de9b78375f
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:43:44Z
[hotfix] Fix checkstyle violations in ExecutionVertex
commit 6e018cfdf84192041a4b1ba27dcbdbf645e8d40b
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:46:37Z
[hotfix] Fix checkstyle violations in ExecutionJobVertex
commit f8805be13d2c0c2da58e0e7ecc6dc102953fc0c5
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T18:48:53Z
[hotfix] Fix checkstyle violations in Execution
commit 0e9fbf8157e45d260a1a418c25871031a98a4995
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T19:38:42Z
[FLINK-9910][scheduling] Execution#scheduleForeExecution does not cancel slot future
In order to properly give back an allocated slot to the SlotPool, one must not complete
the result future of Execution#allocateAndAssignSlotForExecution. This commit changes the
behaviour in Execution#scheduleForExecution accordingly.
commit 1b221e062e5c800f4ce8e716e36f67abcbd75394
Author: Till Rohrmann <tr...@...>
Date: 2018-07-22T19:57:59Z
[FLINK-9911][JM] Use SlotPoolGateway to call failAllocation
Since the SlotPool is an actor, we must use the SlotPoolGateway to interact with
the SlotPool. Otherwise, we might risk an inconsistent state since there are
multiple threads modifying the component.
----
> SlotPool#failAllocation is called outside of main thread
> --------------------------------------------------------
>
> Key: FLINK-9911
> URL: https://issues.apache.org/jira/browse/FLINK-9911
> Project: Flink
> Issue Type: Bug
> Components: JobManager
> Affects Versions: 1.5.1, 1.6.0, 1.7.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The {{JobMaster}} calls directly into the {{SlotPool#failAllocation}} in the method {{JobMaster#notifyAllocationFailure}}. This can the {{SlotPool}} to go into an inconsistent state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)