You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/13 11:33:00 UTC
[jira] [Commented] (FLINK-8529) Let Yarn entry points use YarnConfigOptions#APPLICATION_MASTER_PORT

    [ https://issues.apache.org/jira/browse/FLINK-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362169#comment-16362169 ] 

ASF GitHub Bot commented on FLINK-8529:
---------------------------------------

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/5474

    [FLINK-8529][cassandra] Fix race condition

    ## What is the purpose of the change
    
    This PR fixes a deadlock that could happen if a callback is executed during `CassandraSinkBase#waitForPendingUpdates`.
    
    waitForPendingUpdates:
    ```
    U1: while (updatesPending.get() > 0)
    U2:     synchronized (updatesPending)
    U3:         updatesPending.wait();
    ```
    
    callback:
    ```
    C1: int pending = updatesPending.decrementAndGet();
    C2: if (pending == 0)
    C3:	synchronized (updatesPending)
    C4:	    updatesPending.notifyAll();
    ```
    
    Sequence causing deadlock: U1 -> C1 ... C4 -> U2 -> U3
    (`updatesPending == 1` at the start of sequence)
    
    This was fixed by switching lines U1 and U2:
    ```
    U2: synchronized (updatesPending)
    U1: 	while (updatesPending.get() > 0)
    U3: 	    updatesPending.wait();
    ```
    
    If C1 runs
    * before U2, then waitForPendingUpdates sees that `updatesPending == 0` and exits without waiting
    * after U2, then waitForPendingUpdates is guaranteed to call wait() before the callback calls notifyAll()
    
    ## Verifying this change
    
    The deadlock was reproduced by introducing `OneShotLatches` into the callback/waitForPendingUpdates to force the above execution sequence.
    
    Don't think we can test this properly since it's a timing problem.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 8520

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5474
    
----
commit 88b928b16228245756de12094e1d8a116caf0843
Author: zentol <ch...@...>
Date:   2018-02-13T11:10:22Z

    [FLINK-8529][cassandra] Fix race condition

----


> Let Yarn entry points use YarnConfigOptions#APPLICATION_MASTER_PORT
> -------------------------------------------------------------------
>
>                 Key: FLINK-8529
>                 URL: https://issues.apache.org/jira/browse/FLINK-8529
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, YARN
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> The Yarn cluster entry points should use `YarnConfigOptions#APPLICATION_MASTER_PORT` in order to select the common {{RpcService}} port.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)