You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/02/02 07:42:34 UTC
[jira] [Commented] (FLINK-1350) Add blocking intermediate result partitions

    [ https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300932#comment-14300932 ] 

ASF GitHub Bot commented on FLINK-1350:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/356

    [FLINK-1350][FLINK-1359][Distributed runtime] Add blocking result partition variant

    - **Renames** runtime intermediate result classes (after an offline discussion with @StephanEwen I realized that calling subpartitions *queue* was rather confusing):
      a) Removes "Intermediate" prefix
      b) Queue => Subpartition
      c) Iterator => View
    
      Documentation is coming up soon in FLINK-1373.
    
    - **[FLINK-1350](https://issues.apache.org/jira/browse/FLINK-1350)**: Adds a *spillable result subpartition variant* for **BLOCKING results**, which writes data to memory first and starts to spill (asynchronously) if not enough memory is available to produce the result in-memory only.
    
      Receiving tasks of BLOCKING results are only deployed after *all* partitions have been fully produced. PIPELINED and BLOCKING results can not be mixed.
    
    @StephanEwen, what is your opinion on these two points? We could also start deploying receivers of blocking results as soon as a single partition is finished and then send the update RPC calls later. The current solution is simpler and results in less RPC calls (single per parallel producer).
    
    - **[FLINK-1359](https://issues.apache.org/jira/browse/FLINK-1359)**: Adds **simple state tracking** to result partitions with notifications after partitions/subpartitions have been consumed. Each partition has to be consumed at least once before it can be released.
    
    ---
    
    @StephanEwen, I need some hints on how to integrate this with the NepheleJobGraphGenerator and the BinaryUnionNode to set the blocking result variants for results with multiple consumers only. We also need to sync for upcoming fault tolerance features (Do we have a seperate issue for this?).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/incubator-flink flink-1350-blocking

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/356.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #356
    
----
commit 73460bc88e5b447a7f7f29b499683f403ba0dd93
Author: Ufuk Celebi <uc...@apache.org>
Date:   2015-01-06T16:11:08Z

    [FLINK-1350][FLINK-1359][Distributed runtime] Add blocking result partition variant
    
    - Renames runtime intermediate result classes:
      a) Removes "Intermediate" prefix
      b) Queue => Subpartition
      c) Iterator => View
    
    - [FLINK-1350] Adds a spillable result subpartition variant for BLOCKING
      results, which writes data to memory first and starts to spill
      (asynchronously) if not enough memory is available to produce the
      result in-memory only.
    
      Receiving tasks of BLOCKING results are only deployed after *all*
      partitions have been fully produced. PIPELINED and BLOCKING results can not
      be mixed.
    
    - [FLINK-1359] Adds simple state tracking to result partitions with
      notifications after partitions/subpartitions have been consumed. Each
      partition has to be consumed at least once before it can be released.
    
      Currently there is no notion of historic intermediate results, i.e. results
      are released as soon as they are consumed.

----


> Add blocking intermediate result partitions
> -------------------------------------------
>
>                 Key: FLINK-1350
>                 URL: https://issues.apache.org/jira/browse/FLINK-1350
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> The current state of runtime support for intermediate results (see https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only supports pipelined intermediate results (with back pressure), which are consumed as they are being produced.
> The next variant we need to support are blocking intermediate results (without back pressure), which are fully produced before being consumed. This is for example desirable in situations, where we currently may run into deadlocks when running pipelined.
> I will start working on this on top of my pending pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)