You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/13 11:33:00 UTC
[jira] [Commented] (FLINK-8529) Let Yarn entry points use
YarnConfigOptions#APPLICATION_MASTER_PORT
[ https://issues.apache.org/jira/browse/FLINK-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362169#comment-16362169 ]
ASF GitHub Bot commented on FLINK-8529:
---------------------------------------
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/5474
[FLINK-8529][cassandra] Fix race condition
## What is the purpose of the change
This PR fixes a deadlock that could happen if a callback is executed during `CassandraSinkBase#waitForPendingUpdates`.
waitForPendingUpdates:
```
U1: while (updatesPending.get() > 0)
U2: synchronized (updatesPending)
U3: updatesPending.wait();
```
callback:
```
C1: int pending = updatesPending.decrementAndGet();
C2: if (pending == 0)
C3: synchronized (updatesPending)
C4: updatesPending.notifyAll();
```
Sequence causing deadlock: U1 -> C1 ... C4 -> U2 -> U3
(`updatesPending == 1` at the start of sequence)
This was fixed by switching lines U1 and U2:
```
U2: synchronized (updatesPending)
U1: while (updatesPending.get() > 0)
U3: updatesPending.wait();
```
If C1 runs
* before U2, then waitForPendingUpdates sees that `updatesPending == 0` and exits without waiting
* after U2, then waitForPendingUpdates is guaranteed to call wait() before the callback calls notifyAll()
## Verifying this change
The deadlock was reproduced by introducing `OneShotLatches` into the callback/waitForPendingUpdates to force the above execution sequence.
Don't think we can test this properly since it's a timing problem.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 8520
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5474.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5474
----
commit 88b928b16228245756de12094e1d8a116caf0843
Author: zentol <ch...@...>
Date: 2018-02-13T11:10:22Z
[FLINK-8529][cassandra] Fix race condition
----
> Let Yarn entry points use YarnConfigOptions#APPLICATION_MASTER_PORT
> -------------------------------------------------------------------
>
> Key: FLINK-8529
> URL: https://issues.apache.org/jira/browse/FLINK-8529
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination, YARN
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Major
> Labels: flip-6
> Fix For: 1.5.0
>
>
> The Yarn cluster entry points should use `YarnConfigOptions#APPLICATION_MASTER_PORT` in order to select the common {{RpcService}} port.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)