You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2017/11/06 18:26:17 UTC
[beam-site] branch mergebot updated (cc8a7a7 -> 75b2d8b)
This is an automated email from the ASF dual-hosted git repository.
mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.
discard cc8a7a7 This closes #331
omit a7159d7 Address https://github.com/apache/beam-site/pull/331#discussion_r144906095
omit a4662c7 Address https://github.com/apache/beam-site/pull/331#discussion_r144115767
omit 68c9fef Update IntelliJ Checkstyle instructions
add 627c843 Add RedisIO in the built-in set
add c8d7858 Regenerate website
add 2be51a6 This closes #334: Add RedisIO in the built-in set
add 346b3be Move Elasticsearch v5.x from in-progress to built-in
add aa074e7 This closes #322
add fa66528 [BEAM-664] Update docs: MinimalWordCount in Java is intentionally hardcoded to run only on DirectRunner
add 961cc7c Regenerate website
add 0cf69dd This closes #336
add 6e3e729 [BEAM-3121] Remove broken docker script and documentation
add 887d75f Merge pull request #339 from herohde/portability
add 7c82d51 New top menu plus side nav layout
add 756b59a Regenerate website content
add 0679868 This closes #332: New web site navigation
new 33ac606 [BEAM-1934] Add more CoGroupByKey content/examples
new 90a0460 Update with Java snippet tags
new 75b2d8b This closes #302
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (cc8a7a7)
\
N -- N -- N refs/heads/mergebot (75b2d8b)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
README.md | 10 -
.../capability/2016/03/17/capability-matrix.html | 117 ++-------
.../2016/04/03/presentation-materials.html | 117 ++-------
.../sdk/2016/02/25/python-sdk-now-public.html | 117 ++-------
content/beam/release/2016/06/15/first-release.html | 117 ++-------
.../2016/10/11/strata-hadoop-world-and-beam.html | 117 ++-------
.../update/website/2016/02/22/beam-has-a-logo.html | 117 ++-------
.../blog/2016/05/18/splitAtFraction-method.html | 117 ++-------
.../05/27/where-is-my-pcollection-dot-map.html | 117 ++-------
.../2016/06/13/flink-batch-runner-milestone.html | 117 ++-------
content/blog/2016/08/03/six-months.html | 117 ++-------
content/blog/2016/10/20/test-stream.html | 117 ++-------
content/blog/2017/01/09/added-apex-runner.html | 117 ++-------
content/blog/2017/01/10/beam-graduates.html | 117 ++-------
.../blog/2017/02/01/graduation-media-recap.html | 117 ++-------
content/blog/2017/02/13/stateful-processing.html | 117 ++-------
content/blog/2017/03/16/python-sdk-release.html | 117 ++-------
.../blog/2017/05/17/beam-first-stable-release.html | 117 ++-------
content/blog/2017/08/16/splittable-do-fn.html | 117 ++-------
content/blog/2017/08/28/timely-processing.html | 117 ++-------
content/blog/index.html | 117 ++-------
content/coming-soon.html | 117 ++-------
content/contribute/contribution-guide/index.html | 217 ++++++++-------
content/contribute/design-principles/index.html | 200 +++++++-------
content/contribute/docker-images/index.html | 197 +++++++-------
content/contribute/index.html | 174 ++++++------
content/contribute/logos/index.html | 183 +++++++------
content/contribute/maturity-model/index.html | 252 +++++++++---------
.../contribute/presentation-materials/index.html | 179 ++++++-------
.../contribute/ptransform-style-guide/index.html | 199 +++++++-------
content/contribute/release-guide/index.html | 238 ++++++++++-------
content/contribute/runner-guide/index.html | 256 ++++++++++--------
content/contribute/source-repository/index.html | 180 ++++++-------
content/contribute/team/index.html | 177 ++++++-------
content/contribute/testing/index.html | 278 +++++++++++---------
content/contribute/work-in-progress/index.html | 179 ++++++-------
content/css/site.css | 172 +++++++++++-
content/documentation/dsls/sql/index.html | 217 ++++++++-------
content/documentation/execution-model/index.html | 253 +++++++++++-------
content/documentation/index.html | 247 ++++++++++-------
content/documentation/io/authoring-java/index.html | 239 ++++++++++-------
.../documentation/io/authoring-overview/index.html | 250 +++++++++++-------
.../documentation/io/authoring-python/index.html | 236 ++++++++++-------
.../documentation/io/built-in/hadoop/index.html | 245 ++++++++++-------
content/documentation/io/built-in/index.html | 247 ++++++++++-------
content/documentation/io/contributing/index.html | 236 ++++++++++-------
content/documentation/io/io-toc/index.html | 240 ++++++++++-------
content/documentation/io/testing/index.html | 263 ++++++++++++-------
.../pipelines/create-your-pipeline/index.html | 244 ++++++++++-------
.../pipelines/design-your-pipeline/index.html | 249 +++++++++++-------
.../pipelines/test-your-pipeline/index.html | 258 +++++++++++-------
content/documentation/programming-guide/index.html | 292 ++++++++++++++-------
content/documentation/resources/index.html | 260 +++++++++++-------
content/documentation/runners/apex/index.html | 156 +++++------
.../runners/capability-matrix/index.html | 149 ++++-------
content/documentation/runners/dataflow/index.html | 166 +++++-------
content/documentation/runners/direct/index.html | 159 +++++------
content/documentation/runners/flink/index.html | 165 +++++-------
content/documentation/runners/gearpump/index.html | 156 +++++------
content/documentation/runners/jstorm/index.html | 249 +++++++++++-------
content/documentation/runners/mapreduce/index.html | 154 +++++------
content/documentation/runners/spark/index.html | 168 ++++++------
.../sdks/feature-comparison/index.html} | 163 +++++-------
.../documentation/sdks/java-extensions/index.html | 173 ++++++------
content/documentation/sdks/java/index.html | 165 +++++-------
content/documentation/sdks/nexmark/index.html | 279 +++++++++++++-------
.../documentation/sdks/python-custom-io/index.html | 200 +++++++-------
.../sdks/python-pipeline-dependencies/index.html | 176 ++++++-------
.../sdks/python-type-safety/index.html | 183 +++++++------
content/documentation/sdks/python/index.html | 166 +++++-------
content/get-started/beam-overview/index.html | 167 ++++++------
content/get-started/downloads/index.html | 176 ++++++-------
content/get-started/index.html | 161 +++++-------
.../get-started/mobile-gaming-example/index.html | 185 +++++++------
content/get-started/quickstart-java/index.html | 169 ++++++------
content/get-started/quickstart-py/index.html | 179 ++++++-------
content/get-started/support/index.html | 168 ++++++------
content/get-started/wordcount-example/index.html | 225 +++++++---------
content/index.html | 117 ++-------
content/js/fix-menu.js | 64 +++++
content/js/language-switch.js | 13 +
content/js/page-nav.js | 53 ++++
content/js/section-nav.js | 87 ++++++
content/privacy_policy/index.html | 117 ++-------
run_with_docker.sh | 53 ----
src/_includes/head.html | 3 +
src/_includes/header.html | 114 ++------
src/_includes/page-toc.html | 76 ++++++
src/_includes/section-menu/contribute.html | 33 +++
src/_includes/section-menu/documentation.html | 95 +++++++
src/_includes/section-menu/get-started.html | 20 ++
src/_includes/section-menu/runners.html | 8 +
src/_includes/section-menu/sdks.html | 18 ++
src/_layouts/section.html | 26 ++
src/_sass/_global.sass | 25 +-
src/_sass/_navbar.sass | 59 ++++-
src/_sass/_page-nav.sass | 36 +++
src/_sass/_section-nav.sass | 73 ++++++
src/_sass/_syntax-highlighting.scss | 17 ++
src/_sass/_vars.sass | 3 +
src/contribute/contribution-guide.md | 70 ++---
src/contribute/design-principles.md | 8 +-
src/contribute/docker-images.md | 15 +-
src/contribute/index.md | 3 +-
src/contribute/logos.md | 3 +-
src/contribute/maturity-model.md | 71 ++---
src/contribute/presentation-materials.md | 3 +-
src/contribute/ptransform-style-guide.md | 6 +-
src/contribute/release-guide.md | 5 +-
src/contribute/runner-guide.md | 60 ++---
src/contribute/source-repository.md | 3 +-
src/contribute/team.md | 3 +-
src/contribute/testing.md | 97 +++----
src/contribute/work-in-progress.md | 4 +-
src/css/site.scss | 2 +
src/documentation/dsls/sql.md | 62 +++--
src/documentation/execution-model.md | 4 +-
src/documentation/index.md | 3 +-
src/documentation/io/authoring-java.md | 3 +-
src/documentation/io/authoring-overview.md | 3 +-
src/documentation/io/authoring-python.md | 3 +-
src/documentation/io/built-in-hadoop.md | 13 +-
src/documentation/io/built-in.md | 14 +-
src/documentation/io/contributing.md | 3 +-
src/documentation/io/io-toc.md | 3 +-
src/documentation/io/testing.md | 7 +-
.../pipelines/create-your-pipeline.md | 3 +-
.../pipelines/design-your-pipeline.md | 3 +-
src/documentation/pipelines/test-your-pipeline.md | 9 +-
src/documentation/programming-guide.md | 148 ++++++++---
src/documentation/resources.md | 25 +-
src/documentation/runners/apex.md | 4 +-
src/documentation/runners/capability-matrix.md | 3 +-
src/documentation/runners/dataflow.md | 4 +-
src/documentation/runners/direct.md | 4 +-
src/documentation/runners/flink.md | 4 +-
src/documentation/runners/gearpump.md | 8 +-
src/documentation/runners/jstorm.md | 3 +-
src/documentation/runners/mapreduce.md | 3 +-
src/documentation/runners/spark.md | 5 +-
src/documentation/sdks/feature-comparison.md | 7 +
src/documentation/sdks/java-extensions.md | 7 +-
src/documentation/sdks/java.md | 5 +-
src/documentation/sdks/javadoc/current.md | 4 +-
src/documentation/sdks/javadoc/index.md | 4 +-
src/documentation/sdks/nexmark.md | 4 +-
src/documentation/sdks/pydoc/current.md | 3 +-
src/documentation/sdks/pydoc/index.md | 3 +-
src/documentation/sdks/python-custom-io.md | 16 +-
.../sdks/python-pipeline-dependencies.md | 14 +-
src/documentation/sdks/python-type-safety.md | 4 +-
src/documentation/sdks/python.md | 4 +-
src/get-started/beam-overview.md | 5 +-
src/get-started/downloads.md | 3 +-
src/get-started/index.md | 3 +-
src/get-started/mobile-gaming-example.md | 3 +-
src/get-started/quickstart-java.md | 4 +-
src/get-started/quickstart-py.md | 4 +-
src/get-started/support.md | 5 +-
src/get-started/wordcount-example.md | 40 +--
src/js/fix-menu.js | 64 +++++
src/js/language-switch.js | 13 +
src/js/page-nav.js | 53 ++++
src/js/section-nav.js | 87 ++++++
164 files changed, 8228 insertions(+), 7919 deletions(-)
copy content/{coming-soon.html => documentation/sdks/feature-comparison/index.html} (50%)
create mode 100644 content/js/fix-menu.js
create mode 100644 content/js/page-nav.js
create mode 100644 content/js/section-nav.js
delete mode 100755 run_with_docker.sh
create mode 100644 src/_includes/page-toc.html
create mode 100644 src/_includes/section-menu/contribute.html
create mode 100644 src/_includes/section-menu/documentation.html
create mode 100644 src/_includes/section-menu/get-started.html
create mode 100644 src/_includes/section-menu/runners.html
create mode 100644 src/_includes/section-menu/sdks.html
create mode 100644 src/_layouts/section.html
create mode 100644 src/_sass/_page-nav.sass
create mode 100644 src/_sass/_section-nav.sass
create mode 100644 src/documentation/sdks/feature-comparison.md
create mode 100644 src/js/fix-menu.js
create mode 100644 src/js/page-nav.js
create mode 100644 src/js/section-nav.js
--
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" <co...@beam.apache.org>'].
[beam-site] 01/03: [BEAM-1934] Add more CoGroupByKey
content/examples
Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 33ac606387fc0d68f678c0e43f521e87ec9b9544
Author: melissa <me...@google.com>
AuthorDate: Wed Aug 23 15:25:47 2017 -0700
[BEAM-1934] Add more CoGroupByKey content/examples
---
src/documentation/programming-guide.md | 193 +++++++++++++++++++++++++++------
1 file changed, 162 insertions(+), 31 deletions(-)
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 2ccbd35..c2f95ac 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -785,45 +785,176 @@ tree, [2]
Thus, `GroupByKey` represents a transform from a multimap (multiple keys to
individual values) to a uni-map (unique keys to collections of values).
+##### 4.2.2.1 GroupByKey and unbounded PCollections
+
+If you are using unbounded `PCollection`s, you must use either [non-global
+windowing](#setting-your-pcollections-windowing-function) or an
+[aggregation trigger](#triggers) in order to perform a `GroupByKey` or
+[CoGroupByKey](#cogroupbykey). This is because a bounded `GroupByKey` or
+`CoGroupByKey` must wait for all the data with a certain key to be collected,
+but with unbounded collections, the data is unlimited. Windowing and/or triggers
+allow grouping to operate on logical, finite bundles of data within the
+unbounded data streams.
+
+If you do apply `GroupByKey` or `CoGroupByKey` to a group of unbounded
+`PCollection`s without setting either a non-global windowing strategy, a trigger
+strategy, or both for each collection, Beam generates an IllegalStateException
+error at pipeline construction time.
+
+When using `GroupByKey` or `CoGroupByKey` to group `PCollection`s that have a
+[windowing strategy](#windowing) applied, all of the `PCollection`s you want to
+group *must use the same windowing strategy* and window sizing. For example, all
+of the collections you are merging must use (hypothetically) identical 5-minute
+fixed windows, or 4-minute sliding windows starting every 30 seconds.
+
+If your pipeline attempts to use `GroupByKey` or `CoGroupByKey` to merge
+`PCollection`s with incompatible windows, Beam generates an
+IllegalStateException error at pipeline construction time.
+
#### 4.2.3. CoGroupByKey
-`CoGroupByKey` joins two or more key/value `PCollection`s that have the same key
-type, and then emits a collection of `KV<K, CoGbkResult>` pairs. [Design Your
-Pipeline]({{ site.baseurl }}/documentation/pipelines/design-your-pipeline/#multiple-sources)
+`CoGroupByKey` performs a relational join of two or more key/value
+`PCollection`s that have the same key type.
+[Design Your Pipeline]({{ site.baseurl }}/documentation/pipelines/design-your-pipeline/#multiple-sources)
shows an example pipeline that uses a join.
-Given the input collections below:
+Consider using `CoGroupByKey` if you have multiple data sets that provide
+information about related things. For example, let's say you have two different
+files with user data: one file has names and email addresses; the other file
+has names and phone numbers. You can join those two data sets, using the user
+name as a common key and the other data as the associated values. After the
+join, you have one data set that contains all of the information (email
+addresses and phone numbers) associated with each name.
+
+If you are using unbounded `PCollection`s, you must use either [non-global
+windowing](#setting-your-pcollections-windowing-function) or an
+[aggregation trigger](#triggers) in order to perform a `CoGroupByKey`. See
+[GroupByKey and unbounded PCollections](#groupbykey-and-unbounded-pcollections)
+for more details.
+
+<span class="language-java">
+In the Beam SDK for Java, `CoGroupByKey` accepts a tuple of keyed
+`PCollection`s (`PCollection<KV<K, V>>`) as input. For type safety, the SDK
+requires you to pass each `PCollection` as part of a `KeyedPCollectionTuple`.
+You must declare a `TupleTag` for each input `PCollection` in the
+`KeyedPCollectionTuple` that you want to pass to `CoGroupByKey`. As output,
+`CoGroupByKey` returns a `PCollection<KV<K, CoGbkResult>>`, which groups values
+from all the input `PCollection`s by their common keys. Each key (all of type
+`K`) will have a different `CoGbkResult`, which is a map from `TupleTag<T>` to
+`Iterable<T>`. You can access a specific collection in an `CoGbkResult` object
+by using the `TupleTag` that you supplied with the initial collection.
+</span>
+<span class="language-py">
+In the Beam SDK for Python, `CoGroupByKey` accepts a dictionary of keyed
+`PCollection`s as input. As output, `CoGroupByKey` creates a single output
+`PCollection` that contains one key/value tuple for each key in the input
+`PCollection`s. Each key's value is a dictionary that maps each tag to an
+iterable of the values under they key in the corresponding `PCollection`.
+</span>
+
+The following conceptual examples use two input collections to show the mechanics of
+`CoGroupByKey`.
+
+<span class="language-java">
+The first set of data has a `TupleTag<String>` called `emailTag` and contains names
+and email addresses. The second set of data has a `TupleTag<String>` called
+`phoneTag` and contains names and phone numbers.
+</span>
+<span class="language-py">
+The first set of data contains names and email addresses. The second set of
+data contains names and phone numbers.
+</span>
+
+```java
+// This set of data has a `TupleTag<String>` called `emailTag`.
+ "amy" -> "amy@example.com"
+ "carl" -> "carl@example.com"
+ "julia" -> "julia@example.com"
+ "carl" -> "carl@email.com"
+
+// This set of data has a `TupleTag<String>` called `phoneTag`.
+ "amy" -> "111-222-3333"
+ "james" -> "222-333-4444"
+ "amy" -> "333-444-5555"
+ "carl" -> "444-555-6666"
```
-// collection 1
-user1, address1
-user2, address2
-user3, address3
+```py
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_inputs
+%}```
-// collection 2
-user1, order1
-user1, order2
-user2, order3
-guest, order4
-...
+After `CoGroupByKey`, the resulting data contains all data associated with each
+unique key from any of the input collections.
+
+```java
+ "amy" -> {
+ emailTag -> ["amy@example.com"]
+ phoneTag -> ["111-222-3333", "333-444-5555"]
+ }
+ "carl" -> {
+ emailTag -> ["carl@example.com", "carl@email.com"]
+ phoneTag -> ["444-555-6666"]
+ }
+ "james" -> {
+ emailTag -> [],
+ phoneTag -> ["222-333-4444"]
+ }
+ "julia" -> {
+ emailTag -> ["julia@example.com"],
+ phoneTag -> []
+ }
```
+```py
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_outputs
+%}```
-`CoGroupByKey` gathers up the values with the same key from all `PCollection`s,
-and outputs a new pair consisting of the unique key and an object `CoGbkResult`
-containing all values that were associated with that key. If you apply
-`CoGroupByKey` to the input collections above, the output collection would look
-like this:
+The following code example joins the two `PCollection`s with `CoGroupByKey`,
+followed by a `ParDo` to consume the result. Then, the code uses tags to look up
+and format data from each collection.
+
+```java
+ // Each set of key-value pairs is read into separate PCollections.
+ // Each shares a common key ("K").
+ PCollection<KV<K, V1>> pt1 = ...;
+ PCollection<KV<K, V2>> pt2 = ...;
+
+ // Create tuple tags for the value types in each collection.
+ final TupleTag<V1> t1 = new TupleTag<V1>();
+ final TupleTag<V2> t2 = new TupleTag<V2>();
+
+ // Merge collection values into a CoGbkResult collection
+ PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
+ KeyedPCollectionTuple.of(t1, pt1)
+ .and(t2, pt2)
+ .apply(CoGroupByKey.<K>create());
+
+ // Access results and do something with them.
+ PCollection<T> finalResultCollection =
+ coGbkResultCollection.apply(ParDo.of(
+ new DoFn<KV<K, CoGbkResult>, T>() {
+ @Override
+ public void processElement(ProcessContext c) {
+ KV<K, CoGbkResult> e = c.element();
+ // Get all collection 1 values
+ Iterable<V1> pt1Vals = e.getValue().getAll(t1);
+ // Get all collection 2 values
+ Iterable<V2> pt2Vals = e.getValue().getAll(t2);
+ // ... Do something ...
+ c.output(...some T...);
+ }
+ }));
```
-user1, [[address1], [order1, order2]]
-user2, [[address2], [order3]]
-user3, [[address3], []]
-guest, [[], [order4]]
-...
-````
+```py
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:model_group_by_key_cogroupbykey_tuple
+%}```
-> **A Note on Key/Value Pairs:** Beam represents key/value pairs slightly
-> differently depending on the language and SDK you're using. In the Beam SDK
-> for Java, you represent a key/value pair with an object of type `KV<K, V>`. In
-> Python, you represent key/value pairs with 2-tuples.
+The formatted data looks like this:
+
+```java
+ Sample coming soon.
+```
+```py
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_formatted_outputs
+%}```
#### 4.2.4. Combine
@@ -1078,7 +1209,7 @@ PCollection<String> merged = collections.apply(Flatten.<String>pCollections());
```py
# Flatten takes a tuple of PCollection objects.
-# Returns a single PCollection that contains all of the elements in the
+# Returns a single PCollection that contains all of the elements in the PCollection objects in that tuple.
{%
github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:model_multiple_pcollections_flatten
%}
@@ -1998,7 +2129,7 @@ windows are not actually used until they're needed for the `GroupByKey`.
Subsequent transforms, however, are applied to the result of the `GroupByKey` --
data is grouped by both key and window.
-#### 7.1.2. Using windowing with bounded PCollections
+#### 7.1.2. Windowing with bounded PCollections
You can use windowing with fixed-size data sets in **bounded** `PCollection`s.
However, note that windowing considers only the implicit timestamps attached to
--
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.
[beam-site] 03/03: This closes #302
Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 75b2d8b9cce01c7659f2b3c8378041ec6d850ecb
Merge: 0679868 90a0460
Author: Mergebot <me...@apache.org>
AuthorDate: Mon Nov 6 18:26:01 2017 +0000
This closes #302
src/documentation/programming-guide.md | 145 +++++++++++++++++++++++++--------
1 file changed, 111 insertions(+), 34 deletions(-)
--
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.
[beam-site] 02/03: Update with Java snippet tags
Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 90a04606742ef9ff66ae9f2787f8875a560f5d61
Author: melissa <me...@google.com>
AuthorDate: Fri Oct 27 17:49:13 2017 -0700
Update with Java snippet tags
---
src/documentation/programming-guide.md | 74 +++++-----------------------------
1 file changed, 10 insertions(+), 64 deletions(-)
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index c2f95ac..f746d7d 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -856,9 +856,9 @@ The following conceptual examples use two input collections to show the mechanic
`CoGroupByKey`.
<span class="language-java">
-The first set of data has a `TupleTag<String>` called `emailTag` and contains names
+The first set of data has a `TupleTag<String>` called `emailsTag` and contains names
and email addresses. The second set of data has a `TupleTag<String>` called
-`phoneTag` and contains names and phone numbers.
+`phonesTag` and contains names and phone numbers.
</span>
<span class="language-py">
The first set of data contains names and email addresses. The second set of
@@ -866,18 +866,8 @@ data contains names and phone numbers.
</span>
```java
-// This set of data has a `TupleTag<String>` called `emailTag`.
- "amy" -> "amy@example.com"
- "carl" -> "carl@example.com"
- "julia" -> "julia@example.com"
- "carl" -> "carl@email.com"
-
-// This set of data has a `TupleTag<String>` called `phoneTag`.
- "amy" -> "111-222-3333"
- "james" -> "222-333-4444"
- "amy" -> "333-444-5555"
- "carl" -> "444-555-6666"
-```
+{% github_sample /apache/beam/blob/master/examples/java8/src/test/java/org/apache/beam/examples/website_snippets/SnippetsTest.java tag:CoGroupByKeyTupleInputs
+%}```
```py
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_inputs
%}```
@@ -886,23 +876,8 @@ After `CoGroupByKey`, the resulting data contains all data associated with each
unique key from any of the input collections.
```java
- "amy" -> {
- emailTag -> ["amy@example.com"]
- phoneTag -> ["111-222-3333", "333-444-5555"]
- }
- "carl" -> {
- emailTag -> ["carl@example.com", "carl@email.com"]
- phoneTag -> ["444-555-6666"]
- }
- "james" -> {
- emailTag -> [],
- phoneTag -> ["222-333-4444"]
- }
- "julia" -> {
- emailTag -> ["julia@example.com"],
- phoneTag -> []
- }
-```
+{% github_sample /apache/beam/blob/master/examples/java8/src/test/java/org/apache/beam/examples/website_snippets/SnippetsTest.java tag:CoGroupByKeyTupleOutputs
+%}```
```py
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_outputs
%}```
@@ -912,37 +887,8 @@ followed by a `ParDo` to consume the result. Then, the code uses tags to look up
and format data from each collection.
```java
- // Each set of key-value pairs is read into separate PCollections.
- // Each shares a common key ("K").
- PCollection<KV<K, V1>> pt1 = ...;
- PCollection<KV<K, V2>> pt2 = ...;
-
- // Create tuple tags for the value types in each collection.
- final TupleTag<V1> t1 = new TupleTag<V1>();
- final TupleTag<V2> t2 = new TupleTag<V2>();
-
- // Merge collection values into a CoGbkResult collection
- PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
- KeyedPCollectionTuple.of(t1, pt1)
- .and(t2, pt2)
- .apply(CoGroupByKey.<K>create());
-
- // Access results and do something with them.
- PCollection<T> finalResultCollection =
- coGbkResultCollection.apply(ParDo.of(
- new DoFn<KV<K, CoGbkResult>, T>() {
- @Override
- public void processElement(ProcessContext c) {
- KV<K, CoGbkResult> e = c.element();
- // Get all collection 1 values
- Iterable<V1> pt1Vals = e.getValue().getAll(t1);
- // Get all collection 2 values
- Iterable<V2> pt2Vals = e.getValue().getAll(t2);
- // ... Do something ...
- c.output(...some T...);
- }
- }));
-```
+{% github_sample /apache/beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/website_snippets/Snippets.java tag:CoGroupByKeyTuple
+%}```
```py
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:model_group_by_key_cogroupbykey_tuple
%}```
@@ -950,8 +896,8 @@ and format data from each collection.
The formatted data looks like this:
```java
- Sample coming soon.
-```
+{% github_sample /apache/beam/blob/master/examples/java8/src/test/java/org/apache/beam/examples/website_snippets/SnippetsTest.java tag:CoGroupByKeyTupleFormattedOutputs
+%}```
```py
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_group_by_key_cogroupbykey_tuple_formatted_outputs
%}```
--
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <co...@beam.apache.org>.