You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/15 23:12:04 UTC
[jira] [Commented] (BEAM-2301) Standard expansion of SDF should be
in runners-core-construction
[ https://issues.apache.org/jira/browse/BEAM-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011483#comment-16011483 ]
ASF GitHub Bot commented on BEAM-2301:
--------------------------------------
GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/3156
[BEAM-2301] Splits SplittableParDo into a core-construction part and a runners-core part
SplittableParDo itself goes into core-construction, and expands into a slightly different transform.
This change is almost completely simply moving code around.
Before:
```
elements: InputT
| pair with restriction -> ElementAndRestriction<InputT, RestrictionT>
| split restriction -> same
| explode windows -> same
| assign unique key -> KV<String, ElementAndRestriction<InputT, RestrictionT>>
| GBKIntoKeyedWorkItems -> KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>
| ProcessElements -> PCollection<OutputT>
```
After:
```
elements: InputT
| ...
| assign unique key -> KV<String, ElementAndRestriction<InputT, RestrictionT>>
| SplittableProcessKeyed -> PCollection<OutputT>
```
Most runners (except Dataflow) will still want to go through KeyedWorkItem. That part is encapsulated in `SplittableParDoViaKeyedWorkItems`, which has an `OverrideFactory` for `SplittableProcessKeyed` expanding it into the good old `GBKIntoKeyedWorkItems` and `ProcessElements`. So runner changes are very minor.
Dataflow, however, can not use runners-core during expansion, so it will translate `SplittableProcessKeyed` directly and perform its expansion service-side, and will instantiate `ProcessFn` worker-side.
R: @tgroh
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam sdf-expansion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3156.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3156
----
commit be95bdd679fba755785a8e35a87eb1ec6c440882
Author: Eugene Kirpichov <ki...@google.com>
Date: 2017-05-15T22:54:03Z
Splits SplittableParDo into a core-construction part and a KWI-related part
----
> Standard expansion of SDF should be in runners-core-construction
> ----------------------------------------------------------------
>
> Key: BEAM-2301
> URL: https://issues.apache.org/jira/browse/BEAM-2301
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
>
> As should standard expansions of everything else.
> Since SplittableParDo (the standard expansion of SDF) uses KeyedWorkItem and other things in runners-core that are not available in runners-core-construction, it needs to be refactored somewhat.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)