You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2017/11/06 18:42:17 UTC

[beam-site] branch mergebot updated (75b2d8b -> dafa288)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard 75b2d8b  This closes #302
 discard 90a0460  Update with Java snippet tags
 discard 33ac606  [BEAM-1934] Add more CoGroupByKey content/examples
     new 147e380  Add page for the portability framework
     new dafa288  This closes #340

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (75b2d8b)
            \
             N -- N -- N   refs/heads/mergebot (dafa288)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/contribute/portability.md          | 167 +++++++++++++++++++++++++++++++++
 src/documentation/programming-guide.md | 145 +++++++---------------------
 2 files changed, 201 insertions(+), 111 deletions(-)
 create mode 100644 src/contribute/portability.md

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" <co...@beam.apache.org>'].

[beam-site] 01/02: Add page for the portability framework

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 147e38088f424ef7e69f95d421d16a1a490f2203
Author: Henning Rohde <he...@google.com>
AuthorDate: Wed Nov 1 10:02:45 2017 -0700

    Add page for the portability framework
---
 src/contribute/portability.md | 167 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/src/contribute/portability.md b/src/contribute/portability.md
new file mode 100644
index 0000000..f8e00fd
--- /dev/null
+++ b/src/contribute/portability.md
@@ -0,0 +1,167 @@
+---
+layout: section
+title: "Portability Framework"
+permalink: /contribute/portability/
+section_menu: section-menu/contribute.html
+redirect_from: /portability/
+---
+
+# Portability Framework
+
+* TOC
+{:toc}
+
+## Overview
+
+Interoperability between SDKs and runners is a key aspect of Apache
+Beam. So far, however, the reality is that most runners support the
+Java SDK only, because each SDK-runner combination requires non-trivial
+work on both sides. All runners are also currently written in Java,
+which makes support of non-Java SDKs far more expensive. The
+_portability framework_ aims to rectify this situation and provide
+full interoperability across the Beam ecosystem.
+
+The portability framework introduces well-defined, language-neutral
+data structures and protocols between the SDK and runner. This interop
+layer -- called the _portability API_ -- ensures that SDKs and runners
+can work with each other uniformly, reducing the interoperability
+burden for both SDKs and runners to a constant effort.  It notably
+ensures that _new_ SDKs automatically work with existing runners and
+vice versa.  The framework introduces a new runner, the _Universal
+Local Runner (ULR)_, as a practical reference implementation that
+complements the direct runners. Finally, it enables cross-language
+pipelines (sharing I/O or transformations across SDKs) and
+user-customized execution environments ("custom containers").
+
+The portability API consists of a set of smaller contracts that
+isolate SDKs and runners for job submission, management and
+execution. These contracts use protobufs and gRPC for broad language
+support.
+
+ * **Job submission and management**: The _Runner API_ defines a
+   language-neutral pipeline representation with transformations
+   specifying the execution environment as a docker container
+   image. The latter both allows the execution side to set up the
+   right environment as well as opens the door for custom containers
+   and cross-environment pipelines. The _Job API_ allows pipeline
+   execution and configuration to be managed uniformly.
+
+ * **Job execution**: The _SDK harness_ is a SDK-provided
+   program responsible for executing user code and is run separately
+   from the runner.  The _Fn API_ defines an execution-time binary
+   contract between the SDK harness and the runner that describes how
+   execution tasks are managed and how data is transferred. In
+   addition, the runner needs to handle progress and monitoring in an
+   efficient and language-neutral way. SDK harness initialization
+   relies on the _Provision_ and _Artifact APIs_ for obtaining staged
+   files, pipeline options and environment information. Docker
+   provides isolation between the runner and SDK/user environments to
+   the benefit of both as defined by the _container contract_. The
+   containerization of the SDK gives it (and the user, unless the SDK
+   is closed) full control over its own environment without risk of
+   dependency conflicts. The runner has significant freedom regarding
+   how it manages the SDK harness containers.
+
+The goal is that all (non-direct) runners and SDKs eventually support
+the portability API, perhaps exclusively.
+
+## Design
+
+The [model protos](https://github.com/apache/beam/tree/master/model)
+contain all aspects of the portability API and is the truth on the
+ground. The proto definitions supercede any design documents. The main
+design documents are the following:
+
+ * [Runner API](https://s.apache.org/beam-runner-api). Pipeline
+   representation and discussion on primitive/composite transforms and
+   optimizations.
+   
+ * [Job API](https://s.apache.org/beam-job-api). Job submission and
+   management protocol.
+
+ * [Fn API](https://s.apache.org/beam-fn-api). Execution-side control
+   and data protocols and overview.
+
+ * [Container
+   contract](https://s.apache.org/beam-fn-api-container-contract).
+   Execution-side docker container invocation and provisioning
+   protocols. See
+   [CONTAINERS.md](https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md)
+   for how to build container images.
+
+In discussion:
+
+ * [Cross
+   language](https://s.apache.org/beam-mixed-language-pipelines). Options
+   and tradeoffs for how to handle various kinds of
+   multi-language/multi-SDK pipelines.
+
+## Development
+
+The portability framework is a substantial effort that touches every
+Beam component. In addition to the sheer magnitude, a major challenge
+is engineering an interop layer that does not significantly compromise
+performance due to the additional serialization overhead of a
+language-neutral protocol.
+
+### Roadmap
+
+The proposed project phases are roughly as follows and are not
+strictly sequential, as various components will likely move at
+different speeds. Additionally, there have been (and continues to be)
+supporting refactorings that are not always tracked as part of the
+portability effort. Work already done is not tracked here either.
+
+ * **P1 [MVP]**: Implement the fundamental plumbing for portable SDKs
+   and runners for batch and streaming, including containers and the
+   ULR
+   [[BEAM-2899](https://issues.apache.org/jira/browse/BEAM-2899)]. Each
+   SDK and runner should use the portability framework at least to the
+   extent that wordcount
+   [[BEAM-2896](https://issues.apache.org/jira/browse/BEAM-2896)] and
+   windowed wordcount
+   [[BEAM-2941](https://issues.apache.org/jira/browse/BEAM-2941)] run
+   portably.
+    
+ * **P2 [Feature complete]**: Design and implement portability support
+   for remaining execution-side features, so that any pipeline from
+   any SDK can run portably on any runner. These features include side
+   inputs
+   [[BEAM-2863](https://issues.apache.org/jira/browse/BEAM-2863)], User
+   timers
+   [[BEAM-2925](https://issues.apache.org/jira/browse/BEAM-2925)],
+   Splittable DoFn
+   [[BEAM-2896](https://issues.apache.org/jira/browse/BEAM-2896)] and
+   more.  Each SDK and runner should use the portability framework at
+   least to the extent that the mobile gaming examples
+   [[BEAM-2940](https://issues.apache.org/jira/browse/BEAM-2940)] run
+   portably.
+    
+ * **P3 [Performance]**: Measure and tune performance of portable
+   pipelines using benchmarks such as Nexmark. Features such as
+   progress reporting
+   [[BEAM-2940](https://issues.apache.org/jira/browse/BEAM-2940)],
+   combiner lifting
+   [[BEAM-2937](https://issues.apache.org/jira/browse/BEAM-2937)] and
+   fusion are expected to be needed.
+
+ * **P4 [Cross language]**: Design and implement cross-language
+   pipeline support, including how the ecosystem of shared transforms
+   should work.
+
+### Issues
+
+The portability effort touches every component, so the "portability"
+label is used to identify all portability-related issues. Pure
+design or proto definitions should use the "beam-model" component. A
+common pattern for new portability features is that the overall
+feature is in "beam-model" with subtasks for each SDK and runner in
+their respective components.
+
+**JIRA:** [query](https://issues.apache.org/jira/issues/?filter=12341256)
+
+### Status
+
+MVP in progress. No SDK or runner supports the full portability API
+yet, but once that happens a more detailed progress table will be
+added here.

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.

[beam-site] 02/02: This closes #340

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit dafa288130f1e40130277cf98aa4bd96f624cb23
Merge: 0679868 147e380
Author: Mergebot <me...@apache.org>
AuthorDate: Mon Nov 6 18:42:01 2017 +0000

    This closes #340

 src/contribute/portability.md | 167 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.