You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/19 18:47:20 UTC
[jira] [Commented] (BEAM-452) Implement DoFn per-instance setup and
teardown methods
[ https://issues.apache.org/jira/browse/BEAM-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384666#comment-15384666 ]
ASF GitHub Bot commented on BEAM-452:
-------------------------------------
GitHub user tgroh opened a pull request:
https://github.com/apache/incubator-beam/pull/690
[BEAM-452] Add DoFn setup and teardown methods
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
---
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tgroh/incubator-beam dofn_setup_teardown
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/690.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #690
----
commit d7c4440d23278135c86b193c2d25ac512d5aa5d2
Author: Thomas Groh <tg...@google.com>
Date: 2016-06-28T22:44:49Z
Use the ParDo Application to Cache DoFns
A DoFn application is the scope of reuse.
Factor CloningThreadLocal as the top-level class instead of
SerializableCloningThreadLocalCacheLoader, and extract the Fn from the
AppliedPTransform when loading an absent element.
commit 6f7d10e303a0cb3d86ad0f2c60db5ed1918420d1
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-15T17:51:24Z
Make TransformEvaluatorFactory reuse Explicit
Transform Evaluator Factories must be reused for the entire execution of
a Pipeline and must not be reused across pipelines.
Remove EvaluatorKey, and key explicitly by the transform application.
commit f2c0ba67920ba2e2772ddacc808c5adf38949bc7
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-15T18:27:00Z
Add TransformEvaluatorFactory#cleanup
This cleans up any state stored within the Transform Evaluator Factory.
commit 1f35c4b64aae264d800326421db475be260de2c9
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-14T21:51:02Z
Add DoFn#setup and DoFn#teardown
These methods are called to do expensive setup work, and to clean up a
DoFn before it is discarded.
commit 797633a2209a59736650e255be517ec73137e94d
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-19T18:03:15Z
Replace CloningThreadLocal with DoFnLifecycleManager
This is a more focused interface that interacts with a DoFn before it
is available for use and after it has completed and the reference is
lost. It is required to properly support setup and teardown, as the
fields in a ThreadLocal cannot all be cleaned up without additional
tracking.
Part of BEAM-452.
commit 7bf0b4185d8303b03d47fb99691fd63ae57ad887
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-19T18:08:18Z
fixup! Add DoFn#setup and DoFn#teardown
Handle DoFn setup and teardown in DoFnLifecycleManager
This ensures that the DirectRunner properly interacts with DoFn setup
and teardown methods.
commit 9d1b2c142aff0cb638c027567dda18169b2f8795
Author: Thomas Groh <tg...@google.com>
Date: 2016-07-19T18:06:21Z
fixup! Add DoFn#setup and DoFn#teardown
Call DoFn#setup and #teardown in Flink and Spark
----
> Implement DoFn per-instance setup and teardown methods
> ------------------------------------------------------
>
> Key: BEAM-452
> URL: https://issues.apache.org/jira/browse/BEAM-452
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, runner-direct, runner-flink, runner-spark, sdk-java-core
> Reporter: Thomas Groh
> Assignee: Thomas Groh
>
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit
> BEAM-38 permits DoFns to be reused across bundles. DoFn instances may need to do per-instance setup and teardown, and to avoid redoing the work per-bundle, the system should provide hooks to call before a DoFn is first used and after it will no longer be used.
> DoFn#setup is called before any other calls to DoFn methods. DoFn#teardown is called after any method throws an exception, or when the runner will no longer use a DoFn instance (e.g. when it evicts it from a cache).
> Runners must call these methods appropriately in all cases (including if a DoFn is used exactly once, for a single bundle, and discarded).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)