You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by jk...@apache.org on 2017/08/10 23:55:46 UTC

[beam-site] branch asf-site updated (4878272 -> 1b92d41)

This is an automated email from the ASF dual-hosted git repository.

jkff pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


    from 4878272  This closes #286: A couple of fixes to the website
     new 0f5e5e7  Expands the sections on Coders and Validation in Style Guide
     new 30a5df4  Regenerates website
     new 1b92d41  This closes #279: Expands the section on Coders in Style Guide

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../contribute/ptransform-style-guide/index.html   | 79 ++++++++++++++++------
 src/contribute/ptransform-style-guide.md           | 73 +++++++++++++++-----
 2 files changed, 115 insertions(+), 37 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" <co...@beam.apache.org>'].

[beam-site] 03/03: This closes #279: Expands the section on Coders in Style Guide

Posted by jk...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 1b92d41a0dcff5fd82711518f888874277f517f7
Merge: 4878272 30a5df4
Author: Eugene Kirpichov <ki...@google.com>
AuthorDate: Thu Aug 10 16:55:28 2017 -0700

    This closes #279: Expands the section on Coders in Style Guide

 .../contribute/ptransform-style-guide/index.html   | 79 ++++++++++++++++------
 src/contribute/ptransform-style-guide.md           | 73 +++++++++++++++-----
 2 files changed, 115 insertions(+), 37 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.

[beam-site] 01/03: Expands the sections on Coders and Validation in Style Guide

Posted by jk...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 0f5e5e746b98cc6596cb7ab74eb752878e8f153c
Author: Eugene Kirpichov <ki...@google.com>
AuthorDate: Thu Aug 3 23:13:07 2017 -0700

    Expands the sections on Coders and Validation in Style Guide
    
    * Provides the new guidance that a transform should set coders
      on all collections
    * Provides guidance on choosing and inferring coders
    * Recommends less verbose validation error messages
    * Clarifies what validation goes in expand() vs validate()
---
 src/contribute/ptransform-style-guide.md | 73 ++++++++++++++++++++++++--------
 1 file changed, 55 insertions(+), 18 deletions(-)

diff --git a/src/contribute/ptransform-style-guide.md b/src/contribute/ptransform-style-guide.md
index 3624b02..db69c5a 100644
--- a/src/contribute/ptransform-style-guide.md
+++ b/src/contribute/ptransform-style-guide.md
@@ -452,39 +452,76 @@ public static class FooV2 {
 
 #### Validation
 
-* Validate individual parameters in `.withBlah()` methods. Error messages should mention the method being called, the actual value and the range of valid values.
-* Validate inter-parameter invariants in the `PTransform`'s `.validate()` method.
+* Validate individual parameters in `.withBlah()` methods using `checkArgument()`. Error messages should mention the name of the parameter, the actual value, and the range of valid values.
+* Validate parameter combinations and missing required parameters in the `PTransform`'s `.expand()` method.
+* Validate parameters that the `PTransform` takes from `PipelineOptions` in the `PTransform`'s `.validate(PipelineOptions)` method.
+  These validations will be executed when the pipeline is already fully constructed/expanded and is about to be run with a particular `PipelineOptions`.
+  Most `PTransform`s do not use `PipelineOptions` and thus don't need a `validate()` method - instead, they should perform their validation via the two other methods above.
 
 ```java
 @AutoValue
 public abstract class TwiddleThumbs
     extends PTransform<PCollection<Foo>, PCollection<Bar>> {
   abstract int getMoo();
-  abstract int getBoo();
+  abstract String getBoo();
 
   ...
   // Validating individual parameters
   public TwiddleThumbs withMoo(int moo) {
-    checkArgument(moo >= 0 && moo < 100,
-      "TwiddleThumbs.withMoo() called with an invalid moo of %s. "
-              + "Valid values are 0 (exclusive) to 100 (exclusive)",
-              moo);
-        return toBuilder().setMoo(moo).build();
+    checkArgument(
+        moo >= 0 && moo < 100,
+        "Moo must be between 0 (inclusive) and 100 (exclusive), but was: %s",
+        moo);
+    return toBuilder().setMoo(moo).build();
   }
 
-  // Validating cross-parameter invariants
-  public void validate(PCollection<Foo> input) {
-    checkArgument(getMoo() == 0 || getBoo() == 0,
-      "TwiddleThumbs created with both .withMoo(%s) and .withBoo(%s). "
-      + "Only one of these must be specified.",
-      getMoo(), getBoo());
+  public TwiddleThumbs withBoo(String boo) {
+    checkArgument(boo != null, "Boo can not be null");
+    checkArgument(!boo.isEmpty(), "Boo can not be empty");
+    return toBuilder().setBoo(boo).build();
+  }
+
+  @Override
+  public void validate(PipelineOptions options) {
+    int woo = options.as(TwiddleThumbsOptions.class).getWoo();
+    checkArgument(
+       woo > getMoo(),
+      "Woo (%s) must be smaller than moo (%s)",
+      woo, getMoo());
+  }
+
+  @Override
+  public PCollection<Bar> expand(PCollection<Foo> input) {
+    // Validating that a required parameter is present
+    checkArgument(getBoo() != null, "Must specify boo");
+
+    // Validating a combination of parameters
+    checkArgument(
+        getMoo() == 0 || getBoo() == null,
+        "Must specify at most one of moo or boo, but was: moo = %s, boo = %s",
+        getMoo(), getBoo());
+
+    ...
   }
 }
 ```
 
 #### Coders
 
-* Use `Coder`s only for setting the coder on a `PCollection` or a mutable state cell.
-* When available, use a specific most efficient coder for the datatype (e.g. `StringUtf8Coder.of()` for strings, `ByteArrayCoder.of()` for byte arrays, etc.), rather than using a generic coder like `SerializableCoder`. Develop efficient coders for types that can be elements of `PCollection`s.
-* Do not use coders as a general serialization or parsing mechanism for arbitrary raw byte data. (anti-examples that should be fixed: `TextIO`, `KafkaIO`).
-* In general, any transform that outputs a user-controlled type (that is not its input type) needs to accept a coder in the transform configuration (example: the `Create.of()` transform). This gives the user the ability to control the coder no matter how the transform is structured: e.g., purely letting the user specify the coder on the output `PCollection` of the transform is insufficient in case the transform internally uses intermediate `PCollection`s of this type.
+`Coder`s are a way for a Beam runner to materialize intermediate data or transmit it between workers when necessary. `Coder` should not be used as a general-purpose API for parsing or writing binary formats because the particular binary encoding of a `Coder` is intended to be its private implementation detail.
+
+##### Providing default coders for types
+
+Provide default `Coder`s for all new data types. Use `@DefaultCoder` annotations or `CoderProviderRegistrar` classes annotated with `@AutoService`: see usages of these classes in the SDK for examples. If performance is not important, you can use `SerializableCoder` or `AvroCoder`. Otherwise, develop an efficient custom coder (subclass `AtomicCoder` for concrete types, `StructuredCoder` for generic types).
+
+##### Setting coders on output collections
+
+All `PCollection`s created by your `PTransform` (both output and intermediate collections) must have a `Coder` set on them: a user should never need to call `.setCoder()` to "fix up" a coder on a `PCollection` produced by your `PTransform` (in fact, Beam intends to eventually deprecate `setCoder`). In some cases, coder inference will be sufficient to achieve this; in other cases, your transform will need to explicitly call `setCoder` on its collections.
+
+If the collection is of a concrete type, that type usually has a corresponding coder. Use a specific most efficient coder (e.g. `StringUtf8Coder.of()` for strings, `ByteArrayCoder.of()` for byte arrays, etc.), rather than a general-purpose coder like `SerializableCoder`.
+
+If the type of the collection involves generic type variables, the situation is more complex:
+* If it coincides with the transform's input type or is a simple wrapper over it, you can reuse the coder of the input `PCollection`, available via `input.getCoder()`.
+* Attempt to infer the coder via `input.getPipeline().getCoderRegistry().getCoder(TypeDescriptor)`. Use utilities in `TypeDescriptors` to obtain the `TypeDescriptor` for the generic type. For an example of this approach, see the implementation of `AvroIO.parseGenericRecords()`. However, coder inference for generic types is best-effort and in some cases it may fail due to Java type erasure.
+* Always make it possible for the user to explicitly specify a `Coder` for the relevant type variable(s) as a configuration parameter of your `PTransform`. (e.g. `AvroIO.<T>parseGenericRecords().withCoder(Coder<T>)`). Fall back to inference if the coder was not explicitly specified.
+

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.

[beam-site] 02/03: Regenerates website

Posted by jk...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 30a5df4d85ad236a7d83e05589917a9a03d2cd37
Author: Eugene Kirpichov <ki...@google.com>
AuthorDate: Thu Aug 10 16:54:41 2017 -0700

    Regenerates website
---
 .../contribute/ptransform-style-guide/index.html   | 79 ++++++++++++++++------
 1 file changed, 60 insertions(+), 19 deletions(-)

diff --git a/content/contribute/ptransform-style-guide/index.html b/content/contribute/ptransform-style-guide/index.html
index 56381bb..f351250 100644
--- a/content/contribute/ptransform-style-guide/index.html
+++ b/content/contribute/ptransform-style-guide/index.html
@@ -183,7 +183,11 @@
           <li><a href="#immutability" id="markdown-toc-immutability">Immutability</a></li>
           <li><a href="#serialization" id="markdown-toc-serialization">Serialization</a></li>
           <li><a href="#validation" id="markdown-toc-validation">Validation</a></li>
-          <li><a href="#coders" id="markdown-toc-coders">Coders</a></li>
+          <li><a href="#coders" id="markdown-toc-coders">Coders</a>            <ul>
+              <li><a href="#providing-default-coders-for-types" id="markdown-toc-providing-default-coders-for-types">Providing default coders for types</a></li>
+              <li><a href="#setting-coders-on-output-collections" id="markdown-toc-setting-coders-on-output-collections">Setting coders on output collections</a></li>
+            </ul>
+          </li>
         </ul>
       </li>
     </ul>
@@ -684,32 +688,56 @@ Strive to make such incompatible behavior changes cause a compile error (e.g. it
 <h4 id="validation">Validation</h4>
 
 <ul>
-  <li>Validate individual parameters in <code class="highlighter-rouge">.withBlah()</code> methods. Error messages should mention the method being called, the actual value and the range of valid values.</li>
-  <li>Validate inter-parameter invariants in the <code class="highlighter-rouge">PTransform</code>’s <code class="highlighter-rouge">.validate()</code> method.</li>
+  <li>Validate individual parameters in <code class="highlighter-rouge">.withBlah()</code> methods using <code class="highlighter-rouge">checkArgument()</code>. Error messages should mention the name of the parameter, the actual value, and the range of valid values.</li>
+  <li>Validate parameter combinations and missing required parameters in the <code class="highlighter-rouge">PTransform</code>’s <code class="highlighter-rouge">.expand()</code> method.</li>
+  <li>Validate parameters that the <code class="highlighter-rouge">PTransform</code> takes from <code class="highlighter-rouge">PipelineOptions</code> in the <code class="highlighter-rouge">PTransform</code>’s <code class="highlighter-rouge">.validate(PipelineOptions)</code> method.
+These validations will be executed when the pipeline is already fully constructed/expanded and is about to be run with a particular <code class="highlighter-rouge">PipelineOptions</code>.
+Most <code class="highlighter-rouge">PTransform</code>s do not use <code class="highlighter-rouge">PipelineOptions</code> and thus don’t need a <code class="highlighter-rouge">validate()</code> method - instead, they should perform their validation via the two other methods above.</li>
 </ul>
 
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="nd">@AutoValue</span>
 <span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">TwiddleThumbs</span>
     <span class="kd">extends</span> <span class="n">PTransform</span><span class="o">&lt;</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;,</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
   <span class="kd">abstract</span> <span class="kt">int</span> <span class="nf">getMoo</span><span class="o">();</span>
-  <span class="kd">abstract</span> <span class="kt">int</span> <span class="nf">getBoo</span><span class="o">();</span>
+  <span class="kd">abstract</span> <span class="n">String</span> <span class="nf">getBoo</span><span class="o">();</span>
 
   <span class="o">...</span>
   <span class="c1">// Validating individual parameters</span>
   <span class="kd">public</span> <span class="n">TwiddleThumbs</span> <span class="nf">withMoo</span><span class="o">(</span><span class="kt">int</span> <span class="n">moo</span><span class="o">)</span> <span class="o">{</span>
-    <span class="n">checkArgument</span><span class="o">(</span><span class="n">moo</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">moo</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="o">,</span>
-      <span class="s">"TwiddleThumbs.withMoo() called with an invalid moo of %s. "</span>
-              <span class="o">+</span> <span class="s">"Valid values are 0 (exclusive) to 100 (exclusive)"</span><span class="o">,</span>
-              <span class="n">moo</span><span class="o">);</span>
-        <span class="k">return</span> <span class="nf">toBuilder</span><span class="o">().</span><span class="na">setMoo</span><span class="o">(</span><span class="n">moo</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
+    <span class="n">checkArgument</span><span class="o">(</span>
+        <span class="n">moo</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">moo</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="o">,</span>
+        <span class="s">"Moo must be between 0 (inclusive) and 100 (exclusive), but was: %s"</span><span class="o">,</span>
+        <span class="n">moo</span><span class="o">);</span>
+    <span class="k">return</span> <span class="nf">toBuilder</span><span class="o">().</span><span class="na">setMoo</span><span class="o">(</span><span class="n">moo</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
+  <span class="o">}</span>
+
+  <span class="kd">public</span> <span class="n">TwiddleThumbs</span> <span class="nf">withBoo</span><span class="o">(</span><span class="n">String</span> <span class="n">boo</span><span class="o">)</span> <span class="o">{</span>
+    <span class="n">checkArgument</span><span class="o">(</span><span class="n">boo</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">,</span> <span class="s">"Boo can not be null"</span><span class="o">);</span>
+    <span class="n">checkArgument</span><span class="o">(!</span><span class="n">boo</span><span class="o">.</span><span class="na">isEmpty</span><span class="o">(),</span> <span class="s">"Boo can not be empty"</span><span class="o">);</span>
+    <span class="k">return</span> <span class="nf">toBuilder</span><span class="o">().</span><span class="na">setBoo</span><span class="o">(</span><span class="n">boo</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
   <span class="o">}</span>
 
-  <span class="c1">// Validating cross-parameter invariants</span>
-  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">validate</span><span class="o">(</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
-    <span class="n">checkArgument</span><span class="o">(</span><span class="n">getMoo</span><span class="o">()</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">getBoo</span><span class="o">()</span> <span class="o">==</span> <span class="mi">0</span><span class="o">,</span>
-      <span class="s">"TwiddleThumbs created with both .withMoo(%s) and .withBoo(%s). "</span>
-      <span class="o">+</span> <span class="s">"Only one of these must be specified."</span><span class="o">,</span>
-      <span class="n">getMoo</span><span class="o">(),</span> <span class="n">getBoo</span><span class="o">());</span>
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">validate</span><span class="o">(</span><span class="n">PipelineOptions</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
+    <span class="kt">int</span> <span class="n">woo</span> <span class="o">=</span> <span class="n">options</span><span class="o">.</span><span class="na">as</span><span class="o">(</span><span class="n">TwiddleThumbsOptions</span><span class="o">.</span><span class="na">class</span><span class="o">).</span><span class="na">getWoo</span><span class="o">();</span>
+    <span class="n">checkArgument</span><span class="o">(</span>
+       <span class="n">woo</span> <span class="o">&gt;</span> <span class="n">getMoo</span><span class="o">(),</span>
+      <span class="s">"Woo (%s) must be smaller than moo (%s)"</span><span class="o">,</span>
+      <span class="n">woo</span><span class="o">,</span> <span class="n">getMoo</span><span class="o">());</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;</span> <span class="nf">expand</span><span class="o">(</span><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
+    <span class="c1">// Validating that a required parameter is present</span>
+    <span class="n">checkArgument</span><span class="o">(</span><span class="n">getBoo</span><span class="o">()</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">,</span> <span class="s">"Must specify boo"</span><span class="o">);</span>
+
+    <span class="c1">// Validating a combination of parameters</span>
+    <span class="n">checkArgument</span><span class="o">(</span>
+        <span class="n">getMoo</span><span class="o">()</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">getBoo</span><span class="o">()</span> <span class="o">==</span> <span class="kc">null</span><span class="o">,</span>
+        <span class="s">"Must specify at most one of moo or boo, but was: moo = %s, boo = %s"</span><span class="o">,</span>
+        <span class="n">getMoo</span><span class="o">(),</span> <span class="n">getBoo</span><span class="o">());</span>
+
+    <span class="o">...</span>
   <span class="o">}</span>
 <span class="o">}</span>
 </code></pre>
@@ -717,13 +745,26 @@ Strive to make such incompatible behavior changes cause a compile error (e.g. it
 
 <h4 id="coders">Coders</h4>
 
+<p><code class="highlighter-rouge">Coder</code>s are a way for a Beam runner to materialize intermediate data or transmit it between workers when necessary. <code class="highlighter-rouge">Coder</code> should not be used as a general-purpose API for parsing or writing binary formats because the particular binary encoding of a <code class="highlighter-rouge">Coder</code> is intended to be its private implementation detail.</p>
+
+<h5 id="providing-default-coders-for-types">Providing default coders for types</h5>
+
+<p>Provide default <code class="highlighter-rouge">Coder</code>s for all new data types. Use <code class="highlighter-rouge">@DefaultCoder</code> annotations or <code class="highlighter-rouge">CoderProviderRegistrar</code> classes annotated with <code class="highlighter-rouge">@AutoService</code>: see usages of these classes in the SDK for examples. If performance is not important, you can use <code class="highlighter-rouge">SerializableCoder</code> or <code class="highlighter-rouge">Avr [...]
+
+<h5 id="setting-coders-on-output-collections">Setting coders on output collections</h5>
+
+<p>All <code class="highlighter-rouge">PCollection</code>s created by your <code class="highlighter-rouge">PTransform</code> (both output and intermediate collections) must have a <code class="highlighter-rouge">Coder</code> set on them: a user should never need to call <code class="highlighter-rouge">.setCoder()</code> to “fix up” a coder on a <code class="highlighter-rouge">PCollection</code> produced by your <code class="highlighter-rouge">PTransform</code> (in fact, Beam intends to e [...]
+
+<p>If the collection is of a concrete type, that type usually has a corresponding coder. Use a specific most efficient coder (e.g. <code class="highlighter-rouge">StringUtf8Coder.of()</code> for strings, <code class="highlighter-rouge">ByteArrayCoder.of()</code> for byte arrays, etc.), rather than a general-purpose coder like <code class="highlighter-rouge">SerializableCoder</code>.</p>
+
+<p>If the type of the collection involves generic type variables, the situation is more complex:</p>
 <ul>
-  <li>Use <code class="highlighter-rouge">Coder</code>s only for setting the coder on a <code class="highlighter-rouge">PCollection</code> or a mutable state cell.</li>
-  <li>When available, use a specific most efficient coder for the datatype (e.g. <code class="highlighter-rouge">StringUtf8Coder.of()</code> for strings, <code class="highlighter-rouge">ByteArrayCoder.of()</code> for byte arrays, etc.), rather than using a generic coder like <code class="highlighter-rouge">SerializableCoder</code>. Develop efficient coders for types that can be elements of <code class="highlighter-rouge">PCollection</code>s.</li>
-  <li>Do not use coders as a general serialization or parsing mechanism for arbitrary raw byte data. (anti-examples that should be fixed: <code class="highlighter-rouge">TextIO</code>, <code class="highlighter-rouge">KafkaIO</code>).</li>
-  <li>In general, any transform that outputs a user-controlled type (that is not its input type) needs to accept a coder in the transform configuration (example: the <code class="highlighter-rouge">Create.of()</code> transform). This gives the user the ability to control the coder no matter how the transform is structured: e.g., purely letting the user specify the coder on the output <code class="highlighter-rouge">PCollection</code> of the transform is insufficient in case the transform [...]
+  <li>If it coincides with the transform’s input type or is a simple wrapper over it, you can reuse the coder of the input <code class="highlighter-rouge">PCollection</code>, available via <code class="highlighter-rouge">input.getCoder()</code>.</li>
+  <li>Attempt to infer the coder via <code class="highlighter-rouge">input.getPipeline().getCoderRegistry().getCoder(TypeDescriptor)</code>. Use utilities in <code class="highlighter-rouge">TypeDescriptors</code> to obtain the <code class="highlighter-rouge">TypeDescriptor</code> for the generic type. For an example of this approach, see the implementation of <code class="highlighter-rouge">AvroIO.parseGenericRecords()</code>. However, coder inference for generic types is best-effort and [...]
+  <li>Always make it possible for the user to explicitly specify a <code class="highlighter-rouge">Coder</code> for the relevant type variable(s) as a configuration parameter of your <code class="highlighter-rouge">PTransform</code>. (e.g. <code class="highlighter-rouge">AvroIO.&lt;T&gt;parseGenericRecords().withCoder(Coder&lt;T&gt;)</code>). Fall back to inference if the coder was not explicitly specified.</li>
 </ul>
 
+
     </div>
     <footer class="footer">
   <div class="footer__contained">

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.