You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pablo Estrada (Jira)" <ji...@apache.org> on 2021/09/29 00:45:00 UTC

[jira] [Updated] (BEAM-3566) Replace Python DirectRunner apply_* hooks with PTransformOverrides

     [ https://issues.apache.org/jira/browse/BEAM-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pablo Estrada updated BEAM-3566:
--------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Open)

> Replace Python DirectRunner apply_* hooks with PTransformOverrides
> ------------------------------------------------------------------
>
>                 Key: BEAM-3566
>                 URL: https://issues.apache.org/jira/browse/BEAM-3566
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0
>            Reporter: Charles Chen
>            Priority: P3
>             Fix For: 2.4.0
>
>
> In the Python DirectRunner, we currently use apply_* overrides to override the operation of the default .expand() operation for certain transforms.  For example, GroupByKey has a special implementation in the DirectRunner, so we use an apply_* override hook to replace the implementation of GroupByKey.expand().
> However, this strategy has drawbacks.  Because this override operation happens eagerly during graph construction, the pipeline graph is specialized and modified before a specific runner is bound to the pipeline's execution.  This makes the pipeline graph non-portable and blocks full migration to using the Runner API pipeline representation in the DirectRunner.
> By contrast, the SDK's PTransformOverride mechanism allows the expression of matchers that operate on the unspecialized graph, replacing PTransforms as necessary to produce a DirectRunner-specialized pipeline graph for execution.
> We therefore want to replace these eager apply_* overrides with PTransformOverrides that operate on the completely constructed graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)