You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/02/13 11:32:14 UTC

[GitHub] flink pull request #5474: [FLINK-8529][cassandra] Fix race condition

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/5474

    [FLINK-8529][cassandra] Fix race condition

    ## What is the purpose of the change
    
    This PR fixes a deadlock that could happen if a callback is executed during `CassandraSinkBase#waitForPendingUpdates`.
    
    waitForPendingUpdates:
    ```
    U1: while (updatesPending.get() > 0)
    U2:     synchronized (updatesPending)
    U3:         updatesPending.wait();
    ```
    
    callback:
    ```
    C1: int pending = updatesPending.decrementAndGet();
    C2: if (pending == 0)
    C3:	synchronized (updatesPending)
    C4:	    updatesPending.notifyAll();
    ```
    
    Sequence causing deadlock: U1 -> C1 ... C4 -> U2 -> U3
    (`updatesPending == 1` at the start of sequence)
    
    This was fixed by switching lines U1 and U2:
    ```
    U2: synchronized (updatesPending)
    U1: 	while (updatesPending.get() > 0)
    U3: 	    updatesPending.wait();
    ```
    
    If C1 runs
    * before U2, then waitForPendingUpdates sees that `updatesPending == 0` and exits without waiting
    * after U2, then waitForPendingUpdates is guaranteed to call wait() before the callback calls notifyAll()
    
    ## Verifying this change
    
    The deadlock was reproduced by introducing `OneShotLatches` into the callback/waitForPendingUpdates to force the above execution sequence.
    
    Don't think we can test this properly since it's a timing problem.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 8520

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5474
    
----
commit 88b928b16228245756de12094e1d8a116caf0843
Author: zentol <ch...@...>
Date:   2018-02-13T11:10:22Z

    [FLINK-8529][cassandra] Fix race condition

----


---

[GitHub] flink issue #5474: [FLINK-8520][cassandra] Fix race condition

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/5474
  
    I think the commit has the wrong Flink tag.


---

[GitHub] flink issue #5474: [FLINK-8520][cassandra] Fix race condition

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/5474
  
    yes it should be 8520 (good catch!), will fix while merging.


---

[GitHub] flink pull request #5474: [FLINK-8520][cassandra] Fix race condition

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5474


---