You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2018/07/18 20:40:35 UTC

[beam-site] branch mergebot updated (ef8d164 -> d0eb021)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


    from ef8d164  This closes #494
     add d153419  Prepare repository for deployment.
     new f570391  [BEAM-2977] Improve unbounded prose in wordcount example
     new d0eb021  This closes #377

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/contribute/intellij/index.html |  8 ++++----
 src/get-started/wordcount-example.md   | 32 ++++++++++++++++++++------------
 2 files changed, 24 insertions(+), 16 deletions(-)


[beam-site] 01/02: [BEAM-2977] Improve unbounded prose in wordcount example

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit f5703918714cb7e7beaa583195c0375507be1a2d
Author: melissa <me...@google.com>
AuthorDate: Thu Jan 18 11:51:36 2018 -0800

    [BEAM-2977] Improve unbounded prose in wordcount example
---
 src/get-started/wordcount-example.md | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md
index 7fe0e9b..82d6af9 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -168,7 +168,7 @@ nested transforms (which is a [composite transform]({{ site.baseurl }}/documenta
 Each transform takes some kind of input data and produces some output data. The
 input and output data is often represented by the SDK class `PCollection`.
 `PCollection` is a special class, provided by the Beam SDK, that you can use to
-represent a data set of virtually any size, including unbounded data sets.
+represent a dataset of virtually any size, including unbounded datasets.
 
 ![The MinimalWordCount pipeline data flow.](
   {{ "/images/wordcount-pipeline.png" | prepend: site.baseurl }}){: width="800px"}
@@ -920,13 +920,12 @@ or DEBUG significantly increases the amount of logs output.
 <span class="language-java">`PAssert`</span><span class="language-py">`assert_that`</span>
 is a set of convenient PTransforms in the style of Hamcrest's collection
 matchers that can be used when writing pipeline level tests to validate the
-contents of PCollections. Asserts are best used in unit tests with small data
-sets.
+contents of PCollections. Asserts are best used in unit tests with small datasets.
 
 {:.language-go}
 The `passert` package contains convenient PTransforms that can be used when
 writing pipeline level tests to validate the contents of PCollections. Asserts
-are best used in unit tests with small data sets.
+are best used in unit tests with small datasets.
 
 {:.language-java}
 The following example verifies that the set of filtered words matches our
@@ -975,7 +974,7 @@ examples did, but introduces several advanced concepts.
 
 **New Concepts:**
 
-* Unbounded and bounded pipeline input modes
+* Unbounded and bounded datasets
 * Adding timestamps to data
 * Windowing
 * Reusing PTransforms over windowed PCollections
@@ -1133,12 +1132,21 @@ To view the full code in Go, see
 **[windowed_wordcount.go](https://github.com/apache/beam/blob/master/sdks/go/examples/windowed_wordcount/windowed_wordcount.go).**
 
 
-### Unbounded and bounded pipeline input modes
+### Unbounded and bounded datasets
 
 Beam allows you to create a single pipeline that can handle both bounded and
-unbounded types of input. If your input has a fixed number of elements, it's
-considered a 'bounded' data set. If your input is continuously updating, then
-it's considered 'unbounded' and you must use a runner that supports streaming.
+unbounded datasets. If your dataset has a fixed number of elements, it is a bounded
+dataset and all of the data can be processed together. For bounded datasets,
+the question to ask is "Do I have all of the data?" If data continuously
+arrives (such as an endless stream of game scores in the
+[Mobile gaming example](https://beam.apache.org/get-started/mobile-gaming-example/),
+it is an unbounded dataset. An unbounded dataset is never available for
+processing at any one time, so the data must be processed using a streaming
+pipeline that runs continuously. The dataset will only be complete up to a
+certain point, so the question to ask is "Up until what point do I have all of
+the data?" Beam uses [windowing]({{ site.baseurl }}/documentation/programming-guide/#windowing)
+to divide a continuously updating dataset into logical windows of finite size.
+If your input is unbounded, you must use a runner that supports streaming.
 
 If your pipeline's input is bounded, then all downstream PCollections will also be
 bounded. Similarly, if the input is unbounded, then all downstream PCollections
@@ -1305,7 +1313,7 @@ frequency count of the words seen in each 15 second window.
 
 **New Concepts:**
 
-* Reading an unbounded data set
+* Reading an unbounded dataset
 * Writing unbounded results
 
 **To run this example in Java:**
@@ -1369,9 +1377,9 @@ To view the full code in Python, see
 ([BEAM-4292](https://issues.apache.org/jira/browse/BEAM-4292)).
 
 
-### Reading an unbounded data set
+### Reading an unbounded dataset
 
-This example uses an unbounded data set as input. The code reads Pub/Sub
+This example uses an unbounded dataset as input. The code reads Pub/Sub
 messages from a Pub/Sub subscription or topic using
 [`beam.io.ReadStringsFromPubSub`]({{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.io.gcp.pubsub.html#apache_beam.io.gcp.pubsub.ReadStringsFromPubSub).
 


[beam-site] 02/02: This closes #377

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit d0eb0215c37cf7fcb002bc72b5cf9e7bcf657876
Merge: d153419 f570391
Author: Mergebot <me...@apache.org>
AuthorDate: Wed Jul 18 20:40:14 2018 +0000

    This closes #377

 src/get-started/wordcount-example.md | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)