You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2017/01/24 18:28:48 UTC
[1/4] beam-site git commit: Add plugin for snippet extraction.
Repository: beam-site
Updated Branches:
refs/heads/asf-site c9379f5e3 -> 6a85cdf69
Add plugin for snippet extraction.
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0b44ef77
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0b44ef77
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0b44ef77
Branch: refs/heads/asf-site
Commit: 0b44ef7770f121558d4008290c49dcd3dd4f8bae
Parents: c9379f5
Author: Robert Bradshaw <ro...@gmail.com>
Authored: Fri Jan 20 17:26:41 2017 -0800
Committer: Robert Bradshaw <ro...@gmail.com>
Committed: Fri Jan 20 17:26:41 2017 -0800
----------------------------------------------------------------------
Gemfile | 1 +
Gemfile.lock | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/beam-site/blob/0b44ef77/Gemfile
----------------------------------------------------------------------
diff --git a/Gemfile b/Gemfile
index 9f15c06..f9431e2 100644
--- a/Gemfile
+++ b/Gemfile
@@ -10,6 +10,7 @@ group :jekyll_plugins do
gem 'jekyll-redirect-from'
gem 'jekyll-sass-converter'
gem 'html-proofer'
+ gem 'jekyll_github_sample'
end
# Used by Travis tests.
http://git-wip-us.apache.org/repos/asf/beam-site/blob/0b44ef77/Gemfile.lock
----------------------------------------------------------------------
diff --git a/Gemfile.lock b/Gemfile.lock
index 8d64708..1ab575d 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -40,6 +40,9 @@ GEM
sass (~> 3.4)
jekyll-watch (1.5.0)
listen (~> 3.0, < 3.1)
+ jekyll_github_sample (0.3.0)
+ activesupport (~> 4.0)
+ jekyll (~> 3.0)
json (1.8.3)
kramdown (1.12.0)
liquid (3.0.6)
@@ -77,7 +80,8 @@ DEPENDENCIES
jekyll (= 3.2)
jekyll-redirect-from
jekyll-sass-converter
+ jekyll_github_sample
rake
BUNDLED WITH
- 1.13.5
+ 1.13.7
[4/4] beam-site git commit: This closes #129
Posted by da...@apache.org.
This closes #129
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6a85cdf6
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6a85cdf6
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6a85cdf6
Branch: refs/heads/asf-site
Commit: 6a85cdf69b7f3f72cae6dcbff31ad346c14243e8
Parents: c9379f5 5ffffa2
Author: Davor Bonaci <da...@google.com>
Authored: Tue Jan 24 10:26:05 2017 -0800
Committer: Davor Bonaci <da...@google.com>
Committed: Tue Jan 24 10:26:05 2017 -0800
----------------------------------------------------------------------
Gemfile | 1 +
Gemfile.lock | 6 +-
.../documentation/programming-guide/index.html | 12 +-
src/documentation/programming-guide.md | 115 ++++---------------
4 files changed, 37 insertions(+), 97 deletions(-)
----------------------------------------------------------------------
[2/4] beam-site git commit: Replace literal code samples with
extracted, tested snippets.
Posted by da...@apache.org.
Replace literal code samples with extracted, tested snippets.
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b8673dde
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b8673dde
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b8673dde
Branch: refs/heads/asf-site
Commit: b8673ddee75cbdf92067d5787426c0a68a681d02
Parents: 0b44ef7
Author: Robert Bradshaw <ro...@gmail.com>
Authored: Fri Jan 20 17:55:13 2017 -0800
Committer: Robert Bradshaw <ro...@gmail.com>
Committed: Fri Jan 20 17:55:13 2017 -0800
----------------------------------------------------------------------
src/documentation/programming-guide.md | 115 ++++++----------------------
1 file changed, 22 insertions(+), 93 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/beam-site/blob/b8673dde/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index d743c49..869d9db 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -333,9 +333,8 @@ class ComputeWordLengthFn(beam.DoFn):
# Use return to emit the output element.
return [len(word)]
-# Apply a ParDo to the PCollection "words" to compute lengths for each word.
-word_lengths = words | beam.ParDo(ComputeWordLengthFn())
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_apply
+%}```
In the example, our input `PCollection` contains `String` values. We apply a `ParDo` transform that specifies a function (`ComputeWordLengthFn`) to compute the length of each string, and outputs the result to a new `PCollection` of `Integer` values that stores the length of each word.
@@ -420,8 +419,8 @@ words = ...
# Apply a lambda function to the PCollection words.
# Save the result as the PCollection word_lengths.
-word_lengths = words | beam.FlatMap(lambda x: [len(x)])
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_using_flatmap
+%}```
If your `ParDo` performs a one-to-one mapping of input elements to output elements--that is, for each input element, it applies a function that produces *exactly one* output element, you can use the higher-level <span class="language-java">`MapElements`</span><span class="language-py">`Map`</span> transform. <span class="language-java">`MapElements` can accept an anonymous Java 8 lambda function for additional brevity.</span>
@@ -444,8 +443,8 @@ words = ...
# Apply a Map with a lambda function to the PCollection words.
# Save the result as the PCollection word_lengths.
-word_lengths = words | beam.Map(lambda x: len(x))
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_using_map
+%}```
{:.language-java}
> **Note:** You can use Java 8 lambda functions with several other Beam transforms, including `Filter`, `FlatMapElements`, and `Partition`.
@@ -517,10 +516,8 @@ public static class SumInts implements SerializableFunction<Iterable<Integer>, I
```
```py
-# A bounded sum of positive integers.
-def bounded_sum(values, bound=500):
- return min(sum(values), bound)
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_bounded_sum
+%}```
##### **Advanced Combinations using CombineFn**
@@ -574,20 +571,8 @@ public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
```py
pc = ...
-class AverageFn(beam.CombineFn):
- def create_accumulator(self):
- return (0.0, 0)
-
- def add_input(self, (sum, count), input):
- return sum + input, count + 1
-
- def merge_accumulators(self, accumulators):
- sums, counts = zip(*accumulators)
- return sum(sums), sum(counts)
-
- def extract_output(self, (sum, count)):
- return sum / count if count else float('NaN')
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_custom_average
+%}```
If you are combining a `PCollection` of key-value pairs, [per-key combining](#transforms-combine-per-key) is often enough. If you need the combining strategy to change based on the key (for example, MIN for some users and MAX for other users), you can define a `KeyedCombineFn` to access the key within the combining strategy.
@@ -827,40 +812,14 @@ Side inputs are useful if your `ParDo` needs to inject additional data when proc
# For example, using pvalue.AsIter(pcoll) at pipeline construction time results in an iterable of the actual elements of pcoll being passed into each process invocation.
# In this example, side inputs are passed to a FlatMap transform as extra arguments and consumed by filter_using_length.
-# Callable takes additional arguments.
-def filter_using_length(word, lower_bound, upper_bound=float('inf')):
- if lower_bound <= len(word) <= upper_bound:
- yield word
-
-# Construct a deferred side input.
-avg_word_len = (words
- | beam.Map(len)
- | beam.CombineGlobally(beam.combiners.MeanCombineFn()))
-
-# Call with explicit side inputs.
-small_words = words | 'small' >> beam.FlatMap(filter_using_length, 0, 3)
-
-# A single deferred side input.
-larger_than_average = (words | 'large' >> beam.FlatMap(
- filter_using_length,
- lower_bound=pvalue.AsSingleton(avg_word_len)))
-
-# Mix and match.
-small_but_nontrivial = words | beam.FlatMap(filter_using_length,
- lower_bound=2,
- upper_bound=pvalue.AsSingleton(
- avg_word_len))
-
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_side_input
+%}
# We can also pass side inputs to a ParDo transform, which will get passed to its process method.
# The only change is that the first arguments are self and a context, rather than the PCollection element itself.
-class FilterUsingLength(beam.DoFn):
- def process(self, context, lower_bound, upper_bound=float('inf')):
- if lower_bound <= len(context.element) <= upper_bound:
- yield context.element
-
-small_words = words | beam.ParDo(FilterUsingLength(), 0, 3)
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_side_input_dofn
+%}
...
```
@@ -935,22 +894,13 @@ While `ParDo` always produces a main output `PCollection` (as the return value f
# with_outputs() returns a DoOutputsTuple object. Tags specified in with_outputs are attributes on the returned DoOutputsTuple object.
# The tags give access to the corresponding output PCollections.
-results = (words | beam.ParDo(ProcessWords(), cutoff_length=2, marker='x')
- .with_outputs('above_cutoff_lengths', 'marked strings',
- main='below_cutoff_strings'))
-below = results.below_cutoff_strings
-above = results.above_cutoff_lengths
-marked = results['marked strings'] # indexing works as well
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs
+%}
# The result is also iterable, ordered in the same order that the tags were passed to with_outputs(), the main tag (if specified) first.
-below, above, marked = (words
- | beam.ParDo(
- ProcessWords(), cutoff_length=2, marker='x')
- .with_outputs('above_cutoff_lengths',
- 'marked strings',
- main='below_cutoff_strings'))
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_iter
+%}```
##### Emitting to Side Outputs in your DoFn:
@@ -983,35 +933,14 @@ below, above, marked = (words
# using the pvalue.SideOutputValue wrapper class.
# Based on the previous example, this shows the DoFn emitting to the main and side outputs.
-class ProcessWords(beam.DoFn):
-
- def process(self, context, cutoff_length, marker):
- if len(context.element) <= cutoff_length:
- # Emit this short word to the main output.
- yield context.element
- else:
- # Emit this word's long length to a side output.
- yield pvalue.SideOutputValue(
- 'above_cutoff_lengths', len(context.element))
- if context.element.startswith(marker):
- # Emit this word to a different side output.
- yield pvalue.SideOutputValue('marked strings', context.element)
-
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_emitting_values_on_side_outputs
+%}
# Side outputs are also available in Map and FlatMap.
# Here is an example that uses FlatMap and shows that the tags do not need to be specified ahead of time.
-def even_odd(x):
- yield pvalue.SideOutputValue('odd' if x % 2 else 'even', x)
- if x % 10 == 0:
- yield x
-
-results = numbers | beam.FlatMap(even_odd).with_outputs()
-
-evens = results.even
-odds = results.odd
-tens = results[None] # the undeclared main output
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_undeclared
+%}```
<a name="io"></a>
<a name="running"></a>
[3/4] beam-site git commit: Regenerate website
Posted by da...@apache.org.
Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5ffffa2a
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5ffffa2a
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5ffffa2a
Branch: refs/heads/asf-site
Commit: 5ffffa2a8c4d697c935f42209a08050523e2011c
Parents: b8673dd
Author: Davor Bonaci <da...@google.com>
Authored: Tue Jan 24 10:26:05 2017 -0800
Committer: Davor Bonaci <da...@google.com>
Committed: Tue Jan 24 10:26:05 2017 -0800
----------------------------------------------------------------------
content/documentation/programming-guide/index.html | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/beam-site/blob/5ffffa2a/content/documentation/programming-guide/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html
index d0eb962..6b65ab7 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -579,7 +579,7 @@
<span class="c"># Apply a lambda function to the PCollection words.</span>
<span class="c"># Save the result as the PCollection word_lengths.</span>
-<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)])</span>
+<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)])</span>
</code></pre>
</div>
@@ -603,7 +603,7 @@
<span class="c"># Apply a Map with a lambda function to the PCollection words.</span>
<span class="c"># Save the result as the PCollection word_lengths.</span>
-<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
+<span class="n">word_lengths</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="nb">len</span><span class="p">)</span>
</code></pre>
</div>
@@ -678,9 +678,12 @@ tree, [2]
</code></pre>
</div>
-<div class="language-py highlighter-rouge"><pre class="highlight"><code><span class="c"># A bounded sum of positive integers.</span>
+<div class="language-py highlighter-rouge"><pre class="highlight"><code><span class="n">pc</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">1000</span><span class="p">]</span>
+
<span class="k">def</span> <span class="nf">bounded_sum</span><span class="p">(</span><span class="n">values</span><span class="p">,</span> <span class="n">bound</span><span class="o">=</span><span class="mi">500</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">min</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">values</span><span class="p">),</span> <span class="n">bound</span><span class="p">)</span>
+<span class="n">small_sum</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">bounded_sum</span><span class="p">)</span> <span class="c"># [500]</span>
+<span class="n">large_sum</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">bounded_sum</span><span class="p">,</span> <span class="n">bound</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span> <span class="c"># [1111]</span>
</code></pre>
</div>
@@ -755,6 +758,7 @@ tree, [2]
<span class="k">def</span> <span class="nf">extract_output</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="n">count</span><span class="p">)):</span>
<span class="k">return</span> <span class="nb">sum</span> <span class="o">/</span> <span class="n">count</span> <span class="k">if</span> <span class="n">count</span> <span class="k">else</span> <span class="nb">float</span><span class="p">(</span><span class="s">'NaN'</span><span class="p">)</span>
+<span class="n">average</span> <span class="o">=</span> <span class="n">pc</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="n">AverageFn</span><span class="p">())</span>
</code></pre>
</div>
@@ -1035,6 +1039,7 @@ tree, [2]
<span class="k">yield</span> <span class="n">context</span><span class="o">.</span><span class="n">element</span>
<span class="n">small_words</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">ParDo</span><span class="p">(</span><span class="n">FilterUsingLength</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
+
<span class="o">...</span>
</code></pre>
@@ -1116,6 +1121,7 @@ tree, [2]
<span class="n">above</span> <span class="o">=</span> <span class="n">results</span><span class="o">.</span><span class="n">above_cutoff_lengths</span>
<span class="n">marked</span> <span class="o">=</span> <span class="n">results</span><span class="p">[</span><span class="s">'marked strings'</span><span class="p">]</span> <span class="c"># indexing works as well</span>
+
<span class="c"># The result is also iterable, ordered in the same order that the tags were passed to with_outputs(), the main tag (if specified) first.</span>
<span class="n">below</span><span class="p">,</span> <span class="n">above</span><span class="p">,</span> <span class="n">marked</span> <span class="o">=</span> <span class="p">(</span><span class="n">words</span>