You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by da...@apache.org on 2017/04/21 18:13:49 UTC

[1/3] beam-site git commit: [BEAM-1452] Add composite transforms section to programming guide

Repository: beam-site
Updated Branches:
  refs/heads/asf-site 35d630627 -> 973853241


[BEAM-1452] Add composite transforms section to programming guide


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e98da810
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e98da810
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e98da810

Branch: refs/heads/asf-site
Commit: e98da81006d8a4605548b7d040ccc4ea5d3bd8c7
Parents: 35d6306
Author: melissa <me...@google.com>
Authored: Mon Apr 10 17:08:50 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:13:19 2017 -0700

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 95 ++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/e98da810/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 2e10884..37a80ac 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -224,7 +224,7 @@ In Beam SDK each transform has a generic `apply` method <span class="language-py
 [Output PCollection] = [Input PCollection] | [Transform]
 ```
 
-Because Beam uses a generic `apply` method for `PCollection`, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called **composite transforms** in the Beam SDKs).
+Because Beam uses a generic `apply` method for `PCollection`, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called [composite transforms](#transforms-composite) in the Beam SDKs).
 
 How you apply your pipeline's transforms determines the structure of your pipeline. The best way to think of your pipeline is as a directed acyclic graph, where the nodes are `PCollection`s and the edges are transforms. For example, you can chain transforms to create a sequential pipeline, like this one:
 
@@ -260,7 +260,7 @@ The resulting workflow graph from the branching pipeline above looks like this:
 
 [Branching Graph Graphic]
 
-You can also build your own composite transforms that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places.
+You can also build your own [composite transforms](#transforms-composite) that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places.
 
 ### Transforms in the Beam SDK
 
@@ -943,7 +943,96 @@ While `ParDo` always produces a main output `PCollection` (as the return value f
 
 ## <a name="transforms-composite"></a>Composite Transforms
 
-> **Note:** This section is in progress ([BEAM-1452](https://issues.apache.org/jira/browse/BEAM-1452)).
+Transforms can have a nested structure, where a complex transform performs multiple simpler transforms (such as more than one `ParDo`, `Combine`, `GroupByKey`, or even other composite transforms). These transforms are called composite transforms. Nesting multiple transforms inside a single composite transform can make your code more modular and easier to understand.
+
+The Beam SDK comes packed with many useful composite transforms. See the API reference pages for a list of transforms:
+  * [Pre-written Beam transforms for Java]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/transforms/package-summary.html)
+  * [Pre-written Beam transforms for Python]({{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.transforms.html)
+
+### An example of a composite transform
+
+The `CountWords` transform in the [WordCount example program]({{ site.baseurl }}/get-started/wordcount-example/) is an example of a composite transform. `CountWords` is a `PTransform` subclass that consists of multiple nested transforms.
+
+In its `expand` method, the `CountWords` transform applies the following transform operations:
+
+  1. It applies a `ParDo` on the input `PCollection` of text lines, producing an output `PCollection` of individual words.
+  2. It applies the Beam SDK library transform `Count` on the `PCollection` of words, producing a `PCollection` of key/value pairs. Each key represents a word in the text, and each value represents the number of times that word appeared in the original data.
+
+Note that this is also an example of nested composite transforms, as `Count` is, by itself, a composite transform.
+
+Your composite transform's parameters and return value must match the initial input type and final return type for the entire transform, even if the transform's intermediate data changes type multiple times.
+
+```java
+  public static class CountWords extends PTransform<PCollection<String>,
+      PCollection<KV<String, Long>>> {
+    @Override
+    public PCollection<KV<String, Long>> expand(PCollection<String> lines) {
+
+      // Convert lines of text into individual words.
+      PCollection<String> words = lines.apply(
+          ParDo.of(new ExtractWordsFn()));
+
+      // Count the number of times each word occurs.
+      PCollection<KV<String, Long>> wordCounts =
+          words.apply(Count.<String>perElement());
+
+      return wordCounts;
+    }
+  }
+```
+
+```py
+  Python code snippet coming soon (BEAM-1926)
+```
+
+### Creating a composite transform
+
+To create your own composite transform, create a subclass of the `PTransform` class and override the `expand` method to specify the actual processing logic. You can then use this transform just as you would a built-in transform from the Beam SDK.
+
+{:.language-java}
+For the `PTransform` class type parameters, you pass the `PCollection` types that your transform takes as input, and produces as output. To take multiple `PCollection`s as input, or produce multiple `PCollection`s as output, use one of the multi-collection types for the relevant type parameter.
+
+The following code sample shows how to declare a `PTransform` that accepts a `PCollection` of `String`s for input, and outputs a `PCollection` of `Integer`s:
+
+```java
+  static class ComputeWordLengths
+    extends PTransform<PCollection<String>, PCollection<Integer>> {
+    ...
+  }
+```
+
+```py
+  Python code snippet coming soon (BEAM-1926)
+```
+
+#### Overriding the expand method
+
+Within your `PTransform` subclass, you'll need to override the `expand` method. The `expand` method is where you add the processing logic for the `PTransform`. Your override of `expand` must accept the appropriate type of input `PCollection` as a parameter, and specify the output `PCollection` as the return value.
+
+The following code sample shows how to override `expand` for the `ComputeWordLengths` class declared in the previous example:
+
+```java
+  static class ComputeWordLengths
+      extends PTransform<PCollection<String>, PCollection<Integer>> {
+    @Override
+    public PCollection<Integer> expand(PCollection<String>) {
+      ...
+      // transform logic goes here
+      ...
+    }
+```
+
+```py
+  Python code snippet coming soon (BEAM-1926)
+```
+
+As long as you override the `expand` method in your `PTransform` subclass to accept the appropriate input `PCollection`(s) and return the corresponding output `PCollection`(s), you can include as many transforms as you want. These transforms can include core transforms, composite transforms, or the transforms included in the Beam SDK libraries.
+
+**Note:** The `expand` method of a `PTransform` is not meant to be invoked directly by the user of a transform. Instead, you should call the `apply` method on the `PCollection` itself, with the transform as an argument. This allows transforms to be nested within the structure of your pipeline.
+
+#### PTransform Style Guide
+
+When you create a new `PTransform`, be sure to read the [PTransform Style Guide]({{ site.baseurl }}/contribute/ptransform-style-guide/). The guide contains additional helpful information such as style guidelines, logging and testing guidance, and language-specific considerations.
 
 ## <a name="io"></a>Pipeline I/O

[3/3] beam-site git commit: This closes #206

Posted by da...@apache.org.

This closes #206


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/97385324
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/97385324
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/97385324

Branch: refs/heads/asf-site
Commit: 973853241995348376c60c05fff842bdae343c6e
Parents: 35d6306 5b11965
Author: Davor Bonaci <da...@google.com>
Authored: Fri Apr 21 11:13:42 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:13:42 2017 -0700

----------------------------------------------------------------------
 .../documentation/programming-guide/index.html  | 100 ++++++++++++++++++-
 src/documentation/programming-guide.md          |  95 +++++++++++++++++-
 2 files changed, 187 insertions(+), 8 deletions(-)
----------------------------------------------------------------------

[2/3] beam-site git commit: Regenerate website

Posted by da...@apache.org.

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5b11965c
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5b11965c
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5b11965c

Branch: refs/heads/asf-site
Commit: 5b11965c209c3d5fe08a0b93776d2b749ef63e82
Parents: e98da81
Author: Davor Bonaci <da...@google.com>
Authored: Fri Apr 21 11:13:41 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:13:41 2017 -0700

----------------------------------------------------------------------
 .../documentation/programming-guide/index.html  | 100 ++++++++++++++++++-
 1 file changed, 95 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/5b11965c/content/documentation/programming-guide/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html
index edb184b..38f7bfc 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -398,7 +398,7 @@
 </code></pre>
 </div>
 
-<p>Because Beam uses a generic <code class="highlighter-rouge">apply</code> method for <code class="highlighter-rouge">PCollection</code>, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called <strong>composite transforms</strong> in the Beam SDKs).</p>
+<p>Because Beam uses a generic <code class="highlighter-rouge">apply</code> method for <code class="highlighter-rouge">PCollection</code>, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called <a href="#transforms-composite">composite transforms</a> in the Beam SDKs).</p>
 
 <p>How you apply your pipeline\u2019s transforms determines the structure of your pipeline. The best way to think of your pipeline is as a directed acyclic graph, where the nodes are <code class="highlighter-rouge">PCollection</code>s and the edges are transforms. For example, you can chain transforms to create a sequential pipeline, like this one:</p>
 
@@ -434,7 +434,7 @@
 
 <p>[Branching Graph Graphic]</p>
 
-<p>You can also build your own composite transforms that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places.</p>
+<p>You can also build your own <a href="#transforms-composite">composite transforms</a> that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places.</p>
 
 <h3 id="transforms-in-the-beam-sdk">Transforms in the Beam SDK</h3>
 
@@ -1242,9 +1242,99 @@ guest, [[], [order4]]
 
 <h2 id="a-nametransforms-compositeacomposite-transforms"><a name="transforms-composite"></a>Composite Transforms</h2>
 
-<blockquote>
-  <p><strong>Note:</strong> This section is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-1452">BEAM-1452</a>).</p>
-</blockquote>
+<p>Transforms can have a nested structure, where a complex transform performs multiple simpler transforms (such as more than one <code class="highlighter-rouge">ParDo</code>, <code class="highlighter-rouge">Combine</code>, <code class="highlighter-rouge">GroupByKey</code>, or even other composite transforms). These transforms are called composite transforms. Nesting multiple transforms inside a single composite transform can make your code more modular and easier to understand.</p>
+
+<p>The Beam SDK comes packed with many useful composite transforms. See the API reference pages for a list of transforms:</p>
+<ul>
+  <li><a href="/documentation/sdks/javadoc/0.6.0/index.html?org/apache/beam/sdk/transforms/package-summary.html">Pre-written Beam transforms for Java</a></li>
+  <li><a href="/documentation/sdks/pydoc/0.6.0/apache_beam.transforms.html">Pre-written Beam transforms for Python</a></li>
+</ul>
+
+<h3 id="an-example-of-a-composite-transform">An example of a composite transform</h3>
+
+<p>The <code class="highlighter-rouge">CountWords</code> transform in the <a href="/get-started/wordcount-example/">WordCount example program</a> is an example of a composite transform. <code class="highlighter-rouge">CountWords</code> is a <code class="highlighter-rouge">PTransform</code> subclass that consists of multiple nested transforms.</p>
+
+<p>In its <code class="highlighter-rouge">expand</code> method, the <code class="highlighter-rouge">CountWords</code> transform applies the following transform operations:</p>
+
+<ol>
+  <li>It applies a <code class="highlighter-rouge">ParDo</code> on the input <code class="highlighter-rouge">PCollection</code> of text lines, producing an output <code class="highlighter-rouge">PCollection</code> of individual words.</li>
+  <li>It applies the Beam SDK library transform <code class="highlighter-rouge">Count</code> on the <code class="highlighter-rouge">PCollection</code> of words, producing a <code class="highlighter-rouge">PCollection</code> of key/value pairs. Each key represents a word in the text, and each value represents the number of times that word appeared in the original data.</li>
+</ol>
+
+<p>Note that this is also an example of nested composite transforms, as <code class="highlighter-rouge">Count</code> is, by itself, a composite transform.</p>
+
+<p>Your composite transform\u2019s parameters and return value must match the initial input type and final return type for the entire transform, even if the transform\u2019s intermediate data changes type multiple times.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code>  <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">CountWords</span> <span class="kd">extends</span> <span class="n">PTransform</span><span class="o">&lt;</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;,</span>
+      <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;&gt;</span> <span class="o">{</span>
+    <span class="nd">@Override</span>
+    <span class="kd">public</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="nf">expand</span><span class="o">(</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span><span class="o">)</span> <span class="o">{</span>
+
+      <span class="c1">// Convert lines of text into individual words.</span>
+      <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
+          <span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">ExtractWordsFn</span><span class="o">()));</span>
+
+      <span class="c1">// Count the number of times each word occurs.</span>
+      <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">wordCounts</span> <span class="o">=</span>
+          <span class="n">words</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Count</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="n">perElement</span><span class="o">());</span>
+
+      <span class="k">return</span> <span class="n">wordCounts</span><span class="o">;</span>
+    <span class="o">}</span>
+  <span class="o">}</span>
+</code></pre>
+</div>
+
+<div class="language-py highlighter-rouge"><pre class="highlight"><code>  <span class="n">Python</span> <span class="n">code</span> <span class="n">snippet</span> <span class="n">coming</span> <span class="n">soon</span> <span class="p">(</span><span class="n">BEAM</span><span class="o">-</span><span class="mi">1926</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<h3 id="creating-a-composite-transform">Creating a composite transform</h3>
+
+<p>To create your own composite transform, create a subclass of the <code class="highlighter-rouge">PTransform</code> class and override the <code class="highlighter-rouge">expand</code> method to specify the actual processing logic. You can then use this transform just as you would a built-in transform from the Beam SDK.</p>
+
+<p class="language-java">For the <code class="highlighter-rouge">PTransform</code> class type parameters, you pass the <code class="highlighter-rouge">PCollection</code> types that your transform takes as input, and produces as output. To take multiple <code class="highlighter-rouge">PCollection</code>s as input, or produce multiple <code class="highlighter-rouge">PCollection</code>s as output, use one of the multi-collection types for the relevant type parameter.</p>
+
+<p>The following code sample shows how to declare a <code class="highlighter-rouge">PTransform</code> that accepts a <code class="highlighter-rouge">PCollection</code> of <code class="highlighter-rouge">String</code>s for input, and outputs a <code class="highlighter-rouge">PCollection</code> of <code class="highlighter-rouge">Integer</code>s:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code>  <span class="kd">static</span> <span class="kd">class</span> <span class="nc">ComputeWordLengths</span>
+    <span class="kd">extends</span> <span class="n">PTransform</span><span class="o">&lt;</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;,</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
+    <span class="o">...</span>
+  <span class="o">}</span>
+</code></pre>
+</div>
+
+<div class="language-py highlighter-rouge"><pre class="highlight"><code>  <span class="n">Python</span> <span class="n">code</span> <span class="n">snippet</span> <span class="n">coming</span> <span class="n">soon</span> <span class="p">(</span><span class="n">BEAM</span><span class="o">-</span><span class="mi">1926</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<h4 id="overriding-the-expand-method">Overriding the expand method</h4>
+
+<p>Within your <code class="highlighter-rouge">PTransform</code> subclass, you\u2019ll need to override the <code class="highlighter-rouge">expand</code> method. The <code class="highlighter-rouge">expand</code> method is where you add the processing logic for the <code class="highlighter-rouge">PTransform</code>. Your override of <code class="highlighter-rouge">expand</code> must accept the appropriate type of input <code class="highlighter-rouge">PCollection</code> as a parameter, and specify the output <code class="highlighter-rouge">PCollection</code> as the return value.</p>
+
+<p>The following code sample shows how to override <code class="highlighter-rouge">expand</code> for the <code class="highlighter-rouge">ComputeWordLengths</code> class declared in the previous example:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code>  <span class="kd">static</span> <span class="kd">class</span> <span class="nc">ComputeWordLengths</span>
+      <span class="kd">extends</span> <span class="n">PTransform</span><span class="o">&lt;</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;,</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
+    <span class="nd">@Override</span>
+    <span class="kd">public</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="nf">expand</span><span class="o">(</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;)</span> <span class="o">{</span>
+      <span class="o">...</span>
+      <span class="c1">// transform logic goes here</span>
+      <span class="o">...</span>
+    <span class="o">}</span>
+</code></pre>
+</div>
+
+<div class="language-py highlighter-rouge"><pre class="highlight"><code>  <span class="n">Python</span> <span class="n">code</span> <span class="n">snippet</span> <span class="n">coming</span> <span class="n">soon</span> <span class="p">(</span><span class="n">BEAM</span><span class="o">-</span><span class="mi">1926</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p>As long as you override the <code class="highlighter-rouge">expand</code> method in your <code class="highlighter-rouge">PTransform</code> subclass to accept the appropriate input <code class="highlighter-rouge">PCollection</code>(s) and return the corresponding output <code class="highlighter-rouge">PCollection</code>(s), you can include as many transforms as you want. These transforms can include core transforms, composite transforms, or the transforms included in the Beam SDK libraries.</p>
+
+<p><strong>Note:</strong> The <code class="highlighter-rouge">expand</code> method of a <code class="highlighter-rouge">PTransform</code> is not meant to be invoked directly by the user of a transform. Instead, you should call the <code class="highlighter-rouge">apply</code> method on the <code class="highlighter-rouge">PCollection</code> itself, with the transform as an argument. This allows transforms to be nested within the structure of your pipeline.</p>
+
+<h4 id="ptransform-style-guide">PTransform Style Guide</h4>
+
+<p>When you create a new <code class="highlighter-rouge">PTransform</code>, be sure to read the <a href="/contribute/ptransform-style-guide/">PTransform Style Guide</a>. The guide contains additional helpful information such as style guidelines, logging and testing guidance, and language-specific considerations.</p>
 
 <h2 id="a-nameioapipeline-io"><a name="io"></a>Pipeline I/O</h2>