You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/15 23:12:04 UTC

[jira] [Commented] (BEAM-2301) Standard expansion of SDF should be in runners-core-construction

    [ https://issues.apache.org/jira/browse/BEAM-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011483#comment-16011483 ] 

ASF GitHub Bot commented on BEAM-2301:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3156

    [BEAM-2301] Splits SplittableParDo into a core-construction part and a runners-core part

    SplittableParDo itself goes into core-construction, and expands into a slightly different transform.
    
    This change is almost completely simply moving code around.
    
    Before:
    ```
    elements: InputT
    | pair with restriction -> ElementAndRestriction<InputT, RestrictionT>
    | split restriction -> same
    | explode windows -> same
    | assign unique key -> KV<String, ElementAndRestriction<InputT, RestrictionT>>
    | GBKIntoKeyedWorkItems -> KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>
    | ProcessElements -> PCollection<OutputT>
    ```
    
    After:
    ```
    elements: InputT
    | ...
    | assign unique key -> KV<String, ElementAndRestriction<InputT, RestrictionT>>
    | SplittableProcessKeyed -> PCollection<OutputT>
    ```
    
    Most runners (except Dataflow) will still want to go through KeyedWorkItem. That part is encapsulated in `SplittableParDoViaKeyedWorkItems`, which has an `OverrideFactory` for `SplittableProcessKeyed` expanding it into the good old `GBKIntoKeyedWorkItems` and `ProcessElements`. So runner changes are very minor.
    
    Dataflow, however, can not use runners-core during expansion, so it will translate `SplittableProcessKeyed` directly and perform its expansion service-side, and will instantiate `ProcessFn` worker-side.
    
    R: @tgroh 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam sdf-expansion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3156
    
----
commit be95bdd679fba755785a8e35a87eb1ec6c440882
Author: Eugene Kirpichov <ki...@google.com>
Date:   2017-05-15T22:54:03Z

    Splits SplittableParDo into a core-construction part and a KWI-related part

----


> Standard expansion of SDF should be in runners-core-construction
> ----------------------------------------------------------------
>
>                 Key: BEAM-2301
>                 URL: https://issues.apache.org/jira/browse/BEAM-2301
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> As should standard expansions of everything else.
> Since SplittableParDo (the standard expansion of SDF) uses KeyedWorkItem and other things in runners-core that are not available in runners-core-construction, it needs to be refactored somewhat.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)