You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/19 18:47:20 UTC

[jira] [Commented] (BEAM-452) Implement DoFn per-instance setup and teardown methods

    [ https://issues.apache.org/jira/browse/BEAM-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384666#comment-15384666 ] 

ASF GitHub Bot commented on BEAM-452:
-------------------------------------

GitHub user tgroh opened a pull request:

    https://github.com/apache/incubator-beam/pull/690

    [BEAM-452] Add DoFn setup and teardown methods

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgroh/incubator-beam dofn_setup_teardown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #690
    
----
commit d7c4440d23278135c86b193c2d25ac512d5aa5d2
Author: Thomas Groh <tg...@google.com>
Date:   2016-06-28T22:44:49Z

    Use the ParDo Application to Cache DoFns
    
    A DoFn application is the scope of reuse.
    
    Factor CloningThreadLocal as the top-level class instead of
    SerializableCloningThreadLocalCacheLoader, and extract the Fn from the
    AppliedPTransform when loading an absent element.

commit 6f7d10e303a0cb3d86ad0f2c60db5ed1918420d1
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-15T17:51:24Z

    Make TransformEvaluatorFactory reuse Explicit
    
    Transform Evaluator Factories must be reused for the entire execution of
    a Pipeline and must not be reused across pipelines.
    
    Remove EvaluatorKey, and key explicitly by the transform application.

commit f2c0ba67920ba2e2772ddacc808c5adf38949bc7
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-15T18:27:00Z

    Add TransformEvaluatorFactory#cleanup
    
    This cleans up any state stored within the Transform Evaluator Factory.

commit 1f35c4b64aae264d800326421db475be260de2c9
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-14T21:51:02Z

    Add DoFn#setup and DoFn#teardown
    
    These methods are called to do expensive setup work, and to clean up a
    DoFn before it is discarded.

commit 797633a2209a59736650e255be517ec73137e94d
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-19T18:03:15Z

    Replace CloningThreadLocal with DoFnLifecycleManager
    
    This is a more focused interface that interacts with a DoFn before it
    is available for use and after it has completed and the reference is
    lost. It is required to properly support setup and teardown, as the
    fields in a ThreadLocal cannot all be cleaned up without additional
    tracking.
    
    Part of BEAM-452.

commit 7bf0b4185d8303b03d47fb99691fd63ae57ad887
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-19T18:08:18Z

    fixup! Add DoFn#setup and DoFn#teardown
    
    Handle DoFn setup and teardown in DoFnLifecycleManager
    
    This ensures that the DirectRunner properly interacts with DoFn setup
    and teardown methods.

commit 9d1b2c142aff0cb638c027567dda18169b2f8795
Author: Thomas Groh <tg...@google.com>
Date:   2016-07-19T18:06:21Z

    fixup! Add DoFn#setup and DoFn#teardown
    
    Call DoFn#setup and #teardown in Flink and Spark

----


> Implement DoFn per-instance setup and teardown methods
> ------------------------------------------------------
>
>                 Key: BEAM-452
>                 URL: https://issues.apache.org/jira/browse/BEAM-452
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, runner-direct, runner-flink, runner-spark, sdk-java-core
>            Reporter: Thomas Groh
>            Assignee: Thomas Groh
>
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit
> BEAM-38 permits DoFns to be reused across bundles. DoFn instances may need to do per-instance setup and teardown, and to avoid redoing the work per-bundle, the system should provide hooks to call before a DoFn is first used and after it will no longer be used.
> DoFn#setup is called before any other calls to DoFn methods. DoFn#teardown is called after any method throws an exception, or when the runner will no longer use a DoFn instance (e.g. when it evicts it from a cache).
> Runners must call these methods appropriately in all cases (including if a DoFn is used exactly once, for a single bundle, and discarded).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)