You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by dh...@apache.org on 2017/05/25 21:02:13 UTC

[1/4] beam-site git commit: Add 'how to run' directions to WordCount for all runners.

Repository: beam-site
Updated Branches:
  refs/heads/asf-site ce15747f3 -> 34524776a


Add 'how to run' directions to WordCount for all runners.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/cbc3367c
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/cbc3367c
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/cbc3367c

Branch: refs/heads/asf-site
Commit: cbc3367c92702740f67971398738bf8fee3cdc4e
Parents: ce15747
Author: Hadar Hod <ha...@google.com>
Authored: Thu May 25 10:57:07 2017 -0700
Committer: Hadar Hod <ha...@google.com>
Committed: Thu May 25 10:57:07 2017 -0700

----------------------------------------------------------------------
 src/get-started/wordcount-example.md | 305 +++++++++++++++++++++++++++++-
 1 file changed, 301 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/cbc3367c/src/get-started/wordcount-example.md
----------------------------------------------------------------------
diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md
index 023086d..73e4da8 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -27,11 +27,95 @@ Each WordCount example introduces different concepts in the Beam programming mod
 * **Debugging WordCount** introduces logging and debugging practices.
 * **Windowed WordCount** demonstrates how you can use Beam's programming model to handle both bounded and unbounded datasets.
 
+> Note: The instructions on this page, for how to run the WordCount examples, have not yet been verified for all runners. (See the Jira issues for the [direct](https://issues.apache.org/jira/browse/BEAM-2348), [Apex](https://issues.apache.org/jira/browse/BEAM-2349), [Spark](https://issues.apache.org/jira/browse/BEAM-2350), and [Dataflow](https://issues.apache.org/jira/browse/BEAM-2351) runners).
+
 ## MinimalWordCount
 
 Minimal WordCount demonstrates a simple pipeline that can read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file. This example hard-codes the locations for its input and output files and doesn't perform any error checking; it is intended to only show you the "bare bones" of creating a Beam pipeline. This lack of parameterization makes this particular pipeline less portable across different runners than standard Beam pipelines. In later examples, we will parameterize the pipeline's input and output sources and show other best practices.
 
-To run this example, follow the instructions in the Quickstart for [Java]({{ site.baseurl }}/get-started/quickstart-java) or [Python]({{ site.baseurl }}/get-started/quickstart-py). To view the full code, see **[MinimalWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java).**
+**To run this example in Java:**
+
+{:.runner-direct}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount
+```
+
+{:.runner-apex}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+```
+
+{:.runner-flink-local}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+```
+
+{:.runner-flink-cluster}
+```
+$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
+```
+
+{:.runner-spark}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+```
+
+{:.runner-dataflow}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
+     -Pdataflow-runner
+```
+
+To view the full code in Java, see **[MinimalWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java).**
+
+**To run this example in Python:**
+
+{:.runner-direct}
+```
+python -m apache_beam.examples.wordcount_minimal --input README.md --output counts
+```
+
+{:.runner-apex}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-local}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-cluster}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-spark}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-dataflow}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount_minimal --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                                 --output gs://<your-gcs-bucket>/counts \
+                                                 --runner DataflowRunner \
+                                                 --project your-gcp-project \
+                                                 --temp_location gs://<your-gcs-bucket>/tmp/
+```
+
+To view the full code in Python, see **[wordcount_minimal.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_minimal.py).**
 
 **Key Concepts:**
 
@@ -186,7 +270,90 @@ This WordCount example introduces a few recommended programming practices that c
 
 This section assumes that you have a good understanding of the basic concepts in building a pipeline. If you feel that you aren't at that point yet, read the above section, [Minimal WordCount](#minimalwordcount).
 
-To run this example, follow the instructions in the Quickstart for [Java]({{ site.baseurl }}/get-started/quickstart-java) or [Python]({{ site.baseurl }}/get-started/quickstart-py). To view the full code, see **[WordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java).**
+**To run this example in Java:**
+
+{:.runner-direct}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+```
+
+{:.runner-apex}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+```
+
+{:.runner-flink-local}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+```
+
+{:.runner-flink-cluster}
+```
+$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
+```
+
+{:.runner-spark}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+```
+
+{:.runner-dataflow}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
+                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
+     -Pdataflow-runner
+```
+
+To view the full code in Java, see **[WordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java).**
+
+**To run this example in Python:**
+
+{:.runner-direct}
+```
+python -m apache_beam.examples.wordcount --input README.md --output counts
+```
+
+{:.runner-apex}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-local}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-cluster}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-spark}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-dataflow}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://<your-gcs-bucket>/counts \
+                                         --runner DataflowRunner \
+                                         --project your-gcp-project \
+                                         --temp_location gs://<your-gcs-bucket>/tmp/
+```
+
+To view the full code in Python, see **[wordcount.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py).**
 
 **New Concepts:**
 
@@ -289,7 +456,90 @@ public static void main(String[] args) {
 
 The Debugging WordCount example demonstrates some best practices for instrumenting your pipeline code.
 
-To run this example, follow the instructions in the Quickstart for [Java]({{ site.baseurl }}/get-started/quickstart-java) or [Python]({{ site.baseurl }}/get-started/quickstart-py). To view the full code, see **[DebuggingWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java).**
+**To run this example in Java:**
+
+{:.runner-direct}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+```
+
+{:.runner-apex}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+```
+
+{:.runner-flink-local}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+```
+
+{:.runner-flink-cluster}
+```
+$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
+```
+
+{:.runner-spark}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+```
+
+{:.runner-dataflow}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
+     -Pdataflow-runner
+```
+
+To view the full code in Java, see [DebuggingWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java).
+
+**To run this example in Python:**
+
+{:.runner-direct}
+```
+python -m apache_beam.examples.wordcount_debugging --input README.md --output counts
+```
+
+{:.runner-apex}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-local}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-flink-cluster}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-spark}
+```
+This runner is not yet available for the Python SDK.
+```
+
+{:.runner-dataflow}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount_debugging --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://<your-gcs-bucket>/counts \
+                                         --runner DataflowRunner \
+                                         --project your-gcp-project \
+                                         --temp_location gs://<your-gcs-bucket>/tmp/
+```
+
+To view the full code in Python, see **[wordcount_debugging.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_debugging.py).**
 
 **New Concepts:**
 
@@ -386,11 +636,58 @@ This example, `WindowedWordCount`, counts words in text just as the previous exa
 
 The following sections explain these key concepts in detail, and break down the pipeline code into smaller sections.
 
+**To run this example in Java:**
+
+{:.runner-direct}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+```
+
+{:.runner-apex}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+```
+
+{:.runner-flink-local}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+```
+
+{:.runner-flink-cluster}
+```
+$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
+```
+
+{:.runner-spark}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+```
+
+{:.runner-dataflow}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
+     -Pdataflow-runner
+```
+
+To view the full code in Java, see **[WindowedWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java).**
+
+> **Note:** WindowedWordCount is not yet available for the Python SDK.
+
 ### Unbounded and bounded pipeline input modes
 
 Beam allows you to create a single pipeline that can handle both bounded and unbounded types of input. If the input is unbounded, then all PCollections of the pipeline will be unbounded as well. The same goes for bounded input. If your input has a fixed number of elements, it's considered a 'bounded' data set. If your input is continuously updating, then it's considered 'unbounded'.
 
-Recall that the input for this example is a a set of Shakespeare's texts, finite data. Therefore, this example reads bounded data from a text file:
+Recall that the input for this example is a set of Shakespeare's texts, finite data. Therefore, this example reads bounded data from a text file:
 
 ```java
 public static void main(String[] args) throws IOException {

[4/4] beam-site git commit: Regenerate website after merge

Posted by dh...@apache.org.

Regenerate website after merge


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/34524776
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/34524776
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/34524776

Branch: refs/heads/asf-site
Commit: 34524776a0570cf59ea164132aa2cc64b41d7d73
Parents: ae9e888
Author: Dan Halperin <dh...@google.com>
Authored: Thu May 25 14:02:09 2017 -0700
Committer: Dan Halperin <dh...@google.com>
Committed: Thu May 25 14:02:09 2017 -0700

----------------------------------------------------------------------
 .../get-started/wordcount-example/index.html    | 263 ++++++++++++++++++-
 1 file changed, 259 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/34524776/content/get-started/wordcount-example/index.html
----------------------------------------------------------------------
diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html
index e4efeb7..3a4adf9 100644
--- a/content/get-started/wordcount-example/index.html
+++ b/content/get-started/wordcount-example/index.html
@@ -196,7 +196,77 @@
 
 <p>Minimal WordCount demonstrates a simple pipeline that can read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file. This example hard-codes the locations for its input and output files and doesn’t perform any error checking; it is intended to only show you the “bare bones” of creating a Beam pipeline. This lack of parameterization makes this particular pipeline less portable across different runners than standard Beam pipelines. In later examples, we will parameterize the pipeline’s input and output sources and show other best practices.</p>
 
-<p>To run this example, follow the instructions in the Quickstart for <a href="/get-started/quickstart-java">Java</a> or <a href="/get-started/quickstart-py">Python</a>. To view the full code, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java">MinimalWordCount</a>.</strong></p>
+<p><strong>To run this example in Java:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=&lt;flink master&gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://&lt;flink master&gt;:8081
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://&lt;your-gcs-bucket&gt;/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&lt;your-gcs-bucket&gt;/counts" \
+     -Pdataflow-runner
+</code></pre>
+</div>
+
+<p>To view the full code in Java, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java">MinimalWordCount</a>.</strong></p>
+
+<p><strong>To run this example in Python:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>python -m apache_beam.examples.wordcount_minimal --input README.md --output counts
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code># As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount_minimal --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                                 --output gs://&lt;your-gcs-bucket&gt;/counts \
+                                                 --runner DataflowRunner \
+                                                 --project your-gcp-project \
+                                                 --temp_location gs://&lt;your-gcs-bucket&gt;/tmp/
+</code></pre>
+</div>
+
+<p>To view the full code in Python, see <strong><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_minimal.py">wordcount_minimal.py</a>.</strong></p>
 
 <p><strong>Key Concepts:</strong></p>
 
@@ -368,7 +438,78 @@ Figure 1: The pipeline data flow.</p>
 
 <p>This section assumes that you have a good understanding of the basic concepts in building a pipeline. If you feel that you aren’t at that point yet, read the above section, <a href="#minimalwordcount">Minimal WordCount</a>.</p>
 
-<p>To run this example, follow the instructions in the Quickstart for <a href="/get-started/quickstart-java">Java</a> or <a href="/get-started/quickstart-py">Python</a>. To view the full code, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java">WordCount</a>.</strong></p>
+<p><strong>To run this example in Java:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=&lt;flink master&gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://&lt;flink master&gt;:8081
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://&lt;your-gcs-bucket&gt;/tmp \
+                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&lt;your-gcs-bucket&gt;/counts" \
+     -Pdataflow-runner
+</code></pre>
+</div>
+
+<p>To view the full code in Java, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java">WordCount</a>.</strong></p>
+
+<p><strong>To run this example in Python:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>python -m apache_beam.examples.wordcount --input README.md --output counts
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code># As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://&lt;your-gcs-bucket&gt;/counts \
+                                         --runner DataflowRunner \
+                                         --project your-gcp-project \
+                                         --temp_location gs://&lt;your-gcs-bucket&gt;/tmp/
+</code></pre>
+</div>
+
+<p>To view the full code in Python, see <strong><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py">wordcount.py</a>.</strong></p>
 
 <p><strong>New Concepts:</strong></p>
 
@@ -499,7 +640,78 @@ Figure 1: The pipeline data flow.</p>
 
 <p>The Debugging WordCount example demonstrates some best practices for instrumenting your pipeline code.</p>
 
-<p>To run this example, follow the instructions in the Quickstart for <a href="/get-started/quickstart-java">Java</a> or <a href="/get-started/quickstart-py">Python</a>. To view the full code, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java">DebuggingWordCount</a>.</strong></p>
+<p><strong>To run this example in Java:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=&lt;flink master&gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://&lt;flink master&gt;:8081
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://&lt;your-gcs-bucket&gt;/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&lt;your-gcs-bucket&gt;/counts" \
+     -Pdataflow-runner
+</code></pre>
+</div>
+
+<p>To view the full code in Java, see <a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java">DebuggingWordCount</a>.</p>
+
+<p><strong>To run this example in Python:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>python -m apache_beam.examples.wordcount_debugging --input README.md --output counts
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>This runner is not yet available for the Python SDK.
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code># As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount_debugging --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://&lt;your-gcs-bucket&gt;/counts \
+                                         --runner DataflowRunner \
+                                         --project your-gcp-project \
+                                         --temp_location gs://&lt;your-gcs-bucket&gt;/tmp/
+</code></pre>
+</div>
+
+<p>To view the full code in Python, see <strong><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_debugging.py">wordcount_debugging.py</a>.</strong></p>
 
 <p><strong>New Concepts:</strong></p>
 
@@ -638,11 +850,54 @@ Figure 1: The pipeline data flow.</p>
 
 <p>The following sections explain these key concepts in detail, and break down the pipeline code into smaller sections.</p>
 
+<p><strong>To run this example in Java:</strong></p>
+
+<div class="runner-direct highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+</code></pre>
+</div>
+
+<div class="runner-apex highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-local highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+</code></pre>
+</div>
+
+<div class="runner-flink-cluster highlighter-rouge"><pre class="highlight"><code>$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=&lt;flink master&gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://&lt;flink master&gt;:8081
+</code></pre>
+</div>
+
+<div class="runner-spark highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+</code></pre>
+</div>
+
+<div class="runner-dataflow highlighter-rouge"><pre class="highlight"><code>$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
+   -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://&lt;your-gcs-bucket&gt;/tmp \
+                --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&lt;your-gcs-bucket&gt;/counts" \
+     -Pdataflow-runner
+</code></pre>
+</div>
+
+<p>To view the full code in Java, see <strong><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java">WindowedWordCount</a>.</strong></p>
+
+<blockquote>
+  <p><strong>Note:</strong> WindowedWordCount is not yet available for the Python SDK.</p>
+</blockquote>
+
 <h3 id="unbounded-and-bounded-pipeline-input-modes">Unbounded and bounded pipeline input modes</h3>
 
 <p>Beam allows you to create a single pipeline that can handle both bounded and unbounded types of input. If the input is unbounded, then all PCollections of the pipeline will be unbounded as well. The same goes for bounded input. If your input has a fixed number of elements, it’s considered a ‘bounded’ data set. If your input is continuously updating, then it’s considered ‘unbounded’.</p>
 
-<p>Recall that the input for this example is a a set of Shakespeare’s texts, finite data. Therefore, this example reads bounded data from a text file:</p>
+<p>Recall that the input for this example is a set of Shakespeare’s texts, finite data. Therefore, this example reads bounded data from a text file:</p>
 
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
     <span class="n">Options</span> <span class="n">options</span> <span class="o">=</span> <span class="o">...</span>

[2/4] beam-site git commit: Closes #222

Posted by dh...@apache.org.

Closes #222


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6c6213a6
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6c6213a6
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6c6213a6

Branch: refs/heads/asf-site
Commit: 6c6213a625be425271b31047c648e654f80d0e4b
Parents: ce15747 cbc3367
Author: Dan Halperin <dh...@google.com>
Authored: Thu May 25 13:59:54 2017 -0700
Committer: Dan Halperin <dh...@google.com>
Committed: Thu May 25 13:59:54 2017 -0700

----------------------------------------------------------------------
 src/get-started/wordcount-example.md | 305 +++++++++++++++++++++++++++++-
 1 file changed, 301 insertions(+), 4 deletions(-)
----------------------------------------------------------------------

[3/4] beam-site git commit: remove note about JIRA

Posted by dh...@apache.org.

remove note about JIRA


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/ae9e8886
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/ae9e8886
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/ae9e8886

Branch: refs/heads/asf-site
Commit: ae9e8886836f2f4da455fe3fe65659cbb8e2198a
Parents: 6c6213a
Author: Dan Halperin <dh...@google.com>
Authored: Thu May 25 14:01:24 2017 -0700
Committer: Dan Halperin <dh...@google.com>
Committed: Thu May 25 14:01:24 2017 -0700

----------------------------------------------------------------------
 src/get-started/wordcount-example.md | 2 --
 1 file changed, 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/ae9e8886/src/get-started/wordcount-example.md
----------------------------------------------------------------------
diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md
index 73e4da8..cf0ebc0 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -27,8 +27,6 @@ Each WordCount example introduces different concepts in the Beam programming mod
 * **Debugging WordCount** introduces logging and debugging practices.
 * **Windowed WordCount** demonstrates how you can use Beam's programming model to handle both bounded and unbounded datasets.
 
-> Note: The instructions on this page, for how to run the WordCount examples, have not yet been verified for all runners. (See the Jira issues for the [direct](https://issues.apache.org/jira/browse/BEAM-2348), [Apex](https://issues.apache.org/jira/browse/BEAM-2349), [Spark](https://issues.apache.org/jira/browse/BEAM-2350), and [Dataflow](https://issues.apache.org/jira/browse/BEAM-2351) runners).
-
 ## MinimalWordCount
 
 Minimal WordCount demonstrates a simple pipeline that can read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file. This example hard-codes the locations for its input and output files and doesn't perform any error checking; it is intended to only show you the "bare bones" of creating a Beam pipeline. This lack of parameterization makes this particular pipeline less portable across different runners than standard Beam pipelines. In later examples, we will parameterize the pipeline's input and output sources and show other best practices.