You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/22 14:36:00 UTC

[jira] [Commented] (FLINK-8694) Make notifyDataAvailable call reliable

    [ https://issues.apache.org/jira/browse/FLINK-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372868#comment-16372868 ] 

ASF GitHub Bot commented on FLINK-8694:
---------------------------------------

GitHub user pnowojski opened a pull request:

    https://github.com/apache/flink/pull/5557

    [FLINK-8694][runtime] Walkaround notifyDataAvailable race condition

    Currently there is a race condition that may result in igonoring some notifyDataAvailable calls.
    This is not a big problem as long as OutputFlasher will flush the records in next iteration. However
    in flushAlways case, where the OutpuFlasher is turned off it can lead to data never being sent over
    the network.
    
    This fix walk arounds the problem by enabling OutputFlasher for flushAlways as well and adds stress test
    for flushAlways (without this fix this test is dead locking).
    
    This race condtition doesn't have effects on non streaming cases.
    
    ## Verifying this change
    
    This change added a small stress test.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / no ****/ don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pnowojski/flink f8694-walkaround

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5557
    
----
commit 8a7a0d3104bd3ff0598c76edc916d87f2e5c85ac
Author: Piotr Nowojski <pi...@...>
Date:   2018-02-15T09:54:11Z

    [FLINK-8694][runtime] Walkaround notifyDataAvailable race condition
    
    Currently there is a race condition that may result in igonoring some notifyDataAvailable calls.
    This is not a big problem as long as OutputFlasher will flush the records in next iteration. However
    in flushAlways case, where the OutpuFlasher is turned off it can lead to data never being sent over
    the network.
    
    This fix walk arounds the problem by enabling OutputFlasher for flushAlways as well and adds stress test
    for flushAlways (without this fix this test is dead locking).
    
    This race condtition doesn't have effects on non streaming cases.

----


> Make notifyDataAvailable call reliable
> --------------------------------------
>
>                 Key: FLINK-8694
>                 URL: https://issues.apache.org/jira/browse/FLINK-8694
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Major
>
> After FLINK-8591 org.apache.flink.runtime.io.network.netty.SequenceNumberingViewReader#notifyDataAvailable (and the same for Credit base flow control) due to race condition can be sometimes ignored. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)