You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by da...@apache.org on 2017/01/24 18:28:48 UTC

[1/4] beam-site git commit: Add plugin for snippet extraction.

Repository: beam-site
Updated Branches:
  refs/heads/asf-site c9379f5e3 -> 6a85cdf69


Add plugin for snippet extraction.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0b44ef77
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0b44ef77
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0b44ef77

Branch: refs/heads/asf-site
Commit: 0b44ef7770f121558d4008290c49dcd3dd4f8bae
Parents: c9379f5
Author: Robert Bradshaw <ro...@gmail.com>
Authored: Fri Jan 20 17:26:41 2017 -0800
Committer: Robert Bradshaw <ro...@gmail.com>
Committed: Fri Jan 20 17:26:41 2017 -0800

----------------------------------------------------------------------
 Gemfile      | 1 +
 Gemfile.lock | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/0b44ef77/Gemfile
----------------------------------------------------------------------
diff --git a/Gemfile b/Gemfile
index 9f15c06..f9431e2 100644
--- a/Gemfile
+++ b/Gemfile
@@ -10,6 +10,7 @@ group :jekyll_plugins do
 	gem 'jekyll-redirect-from'
 	gem 'jekyll-sass-converter'
 	gem 'html-proofer'
+	gem 'jekyll_github_sample'
 end
 
 # Used by Travis tests.

http://git-wip-us.apache.org/repos/asf/beam-site/blob/0b44ef77/Gemfile.lock
----------------------------------------------------------------------
diff --git a/Gemfile.lock b/Gemfile.lock
index 8d64708..1ab575d 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -40,6 +40,9 @@ GEM
       sass (~> 3.4)
     jekyll-watch (1.5.0)
       listen (~> 3.0, < 3.1)
+    jekyll_github_sample (0.3.0)
+      activesupport (~> 4.0)
+      jekyll (~> 3.0)
     json (1.8.3)
     kramdown (1.12.0)
     liquid (3.0.6)
@@ -77,7 +80,8 @@ DEPENDENCIES
   jekyll (= 3.2)
   jekyll-redirect-from
   jekyll-sass-converter
+  jekyll_github_sample
   rake
 
 BUNDLED WITH
-   1.13.5
+   1.13.7

[4/4] beam-site git commit: This closes #129

Posted by da...@apache.org.

This closes #129


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6a85cdf6
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6a85cdf6
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6a85cdf6

Branch: refs/heads/asf-site
Commit: 6a85cdf69b7f3f72cae6dcbff31ad346c14243e8
Parents: c9379f5 5ffffa2
Author: Davor Bonaci <da...@google.com>
Authored: Tue Jan 24 10:26:05 2017 -0800
Committer: Davor Bonaci <da...@google.com>
Committed: Tue Jan 24 10:26:05 2017 -0800

----------------------------------------------------------------------
 Gemfile                                         |   1 +
 Gemfile.lock                                    |   6 +-
 .../documentation/programming-guide/index.html  |  12 +-
 src/documentation/programming-guide.md          | 115 ++++---------------
 4 files changed, 37 insertions(+), 97 deletions(-)
----------------------------------------------------------------------

[2/4] beam-site git commit: Replace literal code samples with extracted, tested snippets.

Posted by da...@apache.org.

Replace literal code samples with extracted, tested snippets.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b8673dde
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b8673dde
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b8673dde

Branch: refs/heads/asf-site
Commit: b8673ddee75cbdf92067d5787426c0a68a681d02
Parents: 0b44ef7
Author: Robert Bradshaw <ro...@gmail.com>
Authored: Fri Jan 20 17:55:13 2017 -0800
Committer: Robert Bradshaw <ro...@gmail.com>
Committed: Fri Jan 20 17:55:13 2017 -0800

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 115 ++++++----------------------
 1 file changed, 22 insertions(+), 93 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/b8673dde/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index d743c49..869d9db 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -333,9 +333,8 @@ class ComputeWordLengthFn(beam.DoFn):
     # Use return to emit the output element.
     return [len(word)]
 
-# Apply a ParDo to the PCollection "words" to compute lengths for each word.
-word_lengths = words | beam.ParDo(ComputeWordLengthFn())
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_apply
+%}```
 
 In the example, our input `PCollection` contains `String` values. We apply a `ParDo` transform that specifies a function (`ComputeWordLengthFn`) to compute the length of each string, and outputs the result to a new `PCollection` of `Integer` values that stores the length of each word.
 
@@ -420,8 +419,8 @@ words = ...
 
 # Apply a lambda function to the PCollection words.
 # Save the result as the PCollection word_lengths.
-word_lengths = words | beam.FlatMap(lambda x: [len(x)])
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_using_flatmap
+%}```
 
 If your `ParDo` performs a one-to-one mapping of input elements to output elements--that is, for each input element, it applies a function that produces *exactly one* output element, you can use the higher-level <span class="language-java">`MapElements`</span><span class="language-py">`Map`</span> transform. <span class="language-java">`MapElements` can accept an anonymous Java 8 lambda function for additional brevity.</span>
 
@@ -444,8 +443,8 @@ words = ...
 
 # Apply a Map with a lambda function to the PCollection words.
 # Save the result as the PCollection word_lengths.
-word_lengths = words | beam.Map(lambda x: len(x))
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_using_map
+%}```
 
 {:.language-java}
 > **Note:** You can use Java 8 lambda functions with several other Beam transforms, including `Filter`, `FlatMapElements`, and `Partition`.
@@ -517,10 +516,8 @@ public static class SumInts implements SerializableFunction<Iterable<Integer>, I
 ```
 
 ```py
-# A bounded sum of positive integers.
-def bounded_sum(values, bound=500):
-  return min(sum(values), bound)
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_bounded_sum
+%}```
 
 ##### **Advanced Combinations using CombineFn**
 
@@ -574,20 +571,8 @@ public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
 
 ```py
 pc = ...
-class AverageFn(beam.CombineFn):
-  def create_accumulator(self):
-    return (0.0, 0)
-
-  def add_input(self, (sum, count), input):
-    return sum + input, count + 1
-
-  def merge_accumulators(self, accumulators):
-    sums, counts = zip(*accumulators)
-    return sum(sums), sum(counts)
-
-  def extract_output(self, (sum, count)):
-    return sum / count if count else float('NaN')
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_custom_average
+%}```
 
 If you are combining a `PCollection` of key-value pairs, [per-key combining](#transforms-combine-per-key) is often enough. If you need the combining strategy to change based on the key (for example, MIN for some users and MAX for other users), you can define a `KeyedCombineFn` to access the key within the combining strategy.
 
@@ -827,40 +812,14 @@ Side inputs are useful if your `ParDo` needs to inject additional data when proc
 # For example, using pvalue.AsIter(pcoll) at pipeline construction time results in an iterable of the actual elements of pcoll being passed into each process invocation.
 # In this example, side inputs are passed to a FlatMap transform as extra arguments and consumed by filter_using_length.
 
-# Callable takes additional arguments.
-def filter_using_length(word, lower_bound, upper_bound=float('inf')):
-  if lower_bound <= len(word) <= upper_bound:
-    yield word
-
-# Construct a deferred side input.
-avg_word_len = (words
-                | beam.Map(len)
-                | beam.CombineGlobally(beam.combiners.MeanCombineFn()))
-
-# Call with explicit side inputs.
-small_words = words | 'small' >> beam.FlatMap(filter_using_length, 0, 3)
-
-# A single deferred side input.
-larger_than_average = (words | 'large' >> beam.FlatMap(
-    filter_using_length,
-    lower_bound=pvalue.AsSingleton(avg_word_len)))
-
-# Mix and match.
-small_but_nontrivial = words | beam.FlatMap(filter_using_length,
-                                            lower_bound=2,
-                                            upper_bound=pvalue.AsSingleton(
-                                                avg_word_len))
-
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_side_input
+%}
 
 # We can also pass side inputs to a ParDo transform, which will get passed to its process method.
 # The only change is that the first arguments are self and a context, rather than the PCollection element itself.
 
-class FilterUsingLength(beam.DoFn):
-  def process(self, context, lower_bound, upper_bound=float('inf')):
-    if lower_bound <= len(context.element) <= upper_bound:
-      yield context.element
-
-small_words = words | beam.ParDo(FilterUsingLength(), 0, 3)
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_side_input_dofn
+%}
 ...
 
 ```
@@ -935,22 +894,13 @@ While `ParDo` always produces a main output `PCollection` (as the return value f
 # with_outputs() returns a DoOutputsTuple object. Tags specified in with_outputs are attributes on the returned DoOutputsTuple object.
 # The tags give access to the corresponding output PCollections.
 
-results = (words | beam.ParDo(ProcessWords(), cutoff_length=2, marker='x')
-           .with_outputs('above_cutoff_lengths', 'marked strings',
-                         main='below_cutoff_strings'))
-below = results.below_cutoff_strings
-above = results.above_cutoff_lengths
-marked = results['marked strings']  # indexing works as well
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs
+%}
 
 # The result is also iterable, ordered in the same order that the tags were passed to with_outputs(), the main tag (if specified) first.
 
-below, above, marked = (words
-                        | beam.ParDo(
-                            ProcessWords(), cutoff_length=2, marker='x')
-                        .with_outputs('above_cutoff_lengths',
-                                      'marked strings',
-                                      main='below_cutoff_strings'))
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_iter
+%}```
 
 ##### Emitting to Side Outputs in your DoFn:
 
@@ -983,35 +933,14 @@ below, above, marked = (words
 # using the pvalue.SideOutputValue wrapper class.
 # Based on the previous example, this shows the DoFn emitting to the main and side outputs.
 
-class ProcessWords(beam.DoFn):
-
-  def process(self, context, cutoff_length, marker):
-    if len(context.element) <= cutoff_length:
-      # Emit this short word to the main output.
-      yield context.element
-    else:
-      # Emit this word's long length to a side output.
-      yield pvalue.SideOutputValue(
-          'above_cutoff_lengths', len(context.element))
-    if context.element.startswith(marker):
-      # Emit this word to a different side output.
-      yield pvalue.SideOutputValue('marked strings', context.element)
-
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_emitting_values_on_side_outputs
+%}
 
 # Side outputs are also available in Map and FlatMap.
 # Here is an example that uses FlatMap and shows that the tags do not need to be specified ahead of time.
 
-def even_odd(x):
-  yield pvalue.SideOutputValue('odd' if x % 2 else 'even', x)
-  if x % 10 == 0:
-    yield x
-
-results = numbers | beam.FlatMap(even_odd).with_outputs()
-
-evens = results.even
-odds = results.odd
-tens = results[None]  # the undeclared main output
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_undeclared
+%}```
 
 <a name="io"></a>
 <a name="running"></a>

[3/4] beam-site git commit: Regenerate website

Posted by da...@apache.org.

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5ffffa2a
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5ffffa2a
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5ffffa2a

Branch: refs/heads/asf-site
Commit: 5ffffa2a8c4d697c935f42209a08050523e2011c
Parents: b8673dd
Author: Davor Bonaci <da...@google.com>
Authored: Tue Jan 24 10:26:05 2017 -0800
Committer: Davor Bonaci <da...@google.com>
Committed: Tue Jan 24 10:26:05 2017 -0800

----------------------------------------------------------------------
 content/documentation/programming-guide/index.html | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/5ffffa2a/content/documentation/programming-guide/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html
index d0eb962..6b65ab7 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -579,7 +579,7 @@
 
 <span class="c"># Apply a lambda function to the PCollection words.</span>
 <span class="c"># Save the result as the PCollection word_lengths.</span>
-<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)])</span>
+<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)])</span>
 </code></pre>
 </div>
 
@@ -603,7 +603,7 @@
 
 <span class="c"># Apply a Map with a lambda function to the PCollection words.</span>
 <span class="c"># Save the result as the PCollection word_lengths.</span>
-<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
+<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="nb">len</span><span class="p">)</span>
 </code></pre>
 </div>
 
@@ -678,9 +678,12 @@ tree, [2]
 </code></pre>
 </div>
 
-<div class="language-py highlighter-rouge"><pre class="highlight"><code><span class="c"># A bounded sum of positive integers.</span>
+<div class="language-py highlighter-rouge"><pre class="highlight"><code><span class="n">pc</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">1000</span><span class="p">]</span>
+
 <span class="k">def</span> <span class="nf">bounded_sum</span><span class="p">(</span><span class="n">values</span><span class="p">,</span> <span class="n">bound</span><span class="o">=</span><span class="mi">500</span><span class="p">):</span>
   <span class="k">return</span> <span class="nb">min</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">values</span><span class="p">),</span> <span class="n">bound</span><span class="p">)</span>
+<span class="n">small_sum</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">bounded_sum</span><span class="p">)</span>              <span class="c"># [500]</span>
+<span class="n">large_sum</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">bounded_sum</span><span class="p">,</span> <span class="n">bound</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>  <span class="c"># [1111]</span>
 </code></pre>
 </div>
 
@@ -755,6 +758,7 @@ tree, [2]
 
   <span class="k">def</span> <span class="nf">extract_output</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="n">count</span><span class="p">)):</span>
     <span class="k">return</span> <span class="nb">sum</span> <span class="o">/</span> <span class="n">count</span> <span class="k">if</span> <span class="n">count</span> <span class="k">else</span> <span class="nb">float</span><span class="p">(</span><span class="s">'NaN'</span><span class="p">)</span>
+<span class="n">average</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">AverageFn</span><span class="p">())</span>
 </code></pre>
 </div>
 
@@ -1035,6 +1039,7 @@ tree, [2]
       <span class="k">yield</span> <span class="n">context</span><span class="o">.</span><span class="n">element</span>
 
 <span class="n">small_words</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">ParDo</span><span class="p">(</span><span class="n">FilterUsingLength</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
+
 <span class="o">...</span>
 
 </code></pre>
@@ -1116,6 +1121,7 @@ tree, [2]
 <span class="n">above</span> <span class="o">=</span> <span class="n">results</span><span class="o">.</span><span class="n">above_cutoff_lengths</span>
 <span class="n">marked</span> <span class="o">=</span> <span class="n">results</span><span class="p">[</span><span class="s">'marked strings'</span><span class="p">]</span>  <span class="c"># indexing works as well</span>
 
+
 <span class="c"># The result is also iterable, ordered in the same order that the tags were passed to with_outputs(), the main tag (if specified) first.</span>
 
 <span class="n">below</span><span class="p">,</span> <span class="n">above</span><span class="p">,</span> <span class="n">marked</span> <span class="o">=</span> <span class="p">(</span><span class="n">words</span>