You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pablo Estrada (Jira)" <ji...@apache.org> on 2021/09/29 00:45:00 UTC
[jira] [Updated] (BEAM-3566) Replace Python DirectRunner apply_*
hooks with PTransformOverrides
[ https://issues.apache.org/jira/browse/BEAM-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pablo Estrada updated BEAM-3566:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Open)
> Replace Python DirectRunner apply_* hooks with PTransformOverrides
> ------------------------------------------------------------------
>
> Key: BEAM-3566
> URL: https://issues.apache.org/jira/browse/BEAM-3566
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.2.0
> Reporter: Charles Chen
> Priority: P3
> Fix For: 2.4.0
>
>
> In the Python DirectRunner, we currently use apply_* overrides to override the operation of the default .expand() operation for certain transforms. For example, GroupByKey has a special implementation in the DirectRunner, so we use an apply_* override hook to replace the implementation of GroupByKey.expand().
> However, this strategy has drawbacks. Because this override operation happens eagerly during graph construction, the pipeline graph is specialized and modified before a specific runner is bound to the pipeline's execution. This makes the pipeline graph non-portable and blocks full migration to using the Runner API pipeline representation in the DirectRunner.
> By contrast, the SDK's PTransformOverride mechanism allows the expression of matchers that operate on the unspecialized graph, replacing PTransforms as necessary to produce a DirectRunner-specialized pipeline graph for execution.
> We therefore want to replace these eager apply_* overrides with PTransformOverrides that operate on the completely constructed graph.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)