You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Chamikara Jayalath (JIRA)" <ji...@apache.org> on 2017/04/10 17:43:41 UTC

[jira] [Created] (BEAM-1925) Make DoFn invocation logic of Python SDK more extensible

Chamikara Jayalath created BEAM-1925:
----------------------------------------

             Summary: Make DoFn invocation logic of Python SDK more extensible
                 Key: BEAM-1925
                 URL: https://issues.apache.org/jira/browse/BEAM-1925
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py
            Reporter: Chamikara Jayalath
            Assignee: Chamikara Jayalath


DoFn invocation logic of Python SDK is currently in DoFnRunner class.

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54

At initialization of this, we parse a DoFn and create local state. We use this state when invoking DoFn methods process, start_bundle, and finish_bundle. For example, we store a list of  ArgPlaceholder objects within the state of DoFnRunner to facilitate invocation of process method.

We will need to extend this functionality when adding new features to DoFn class (for example to support Splittable DoFn [1]). So I think it's good to refactor this code to be more extensible. 

I think a good approach for this is to add DoFnInvoker and DoFnSignature classes similar to Java SDK [2].

In this approach:
A DoFnSignature captures the signature of a DoFn including methods and arguments.
A DoFnInvoker implements a particular way DoFn methods will be executed (initially we'll have simple and per-window invokers [3]).

A runner uses DoFnRunner to execute methods of a given DoFn. At initialization, DoFnRunner crates a DoFnSignature and a DoFnInvoker for the given DoFn.

DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn implementation as well. 


[1] https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql

[2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java

[3] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)