You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2017/01/31 01:37:22 UTC
[2/5] beam-site git commit: Added Python Quickstart

Added Python Quickstart


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/852da20d
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/852da20d
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/852da20d

Branch: refs/heads/asf-site
Commit: 852da20d32f25fb89b55e74f5f1f86040d353cbd
Parents: fb8666e
Author: Hadar Hod <ha...@google.com>
Authored: Wed Jan 25 01:46:37 2017 -0800
Committer: Davor Bonaci <da...@google.com>
Committed: Mon Jan 30 17:35:51 2017 -0800

----------------------------------------------------------------------
 src/_includes/header.html            |   3 +-
 src/get-started/beam-overview.md     |   6 +-
 src/get-started/quickstart-java.md   | 236 ++++++++++++++++++++++++++++++
 src/get-started/quickstart-py.md     |  97 ++++++++++++
 src/get-started/quickstart.md        | 235 -----------------------------
 src/get-started/wordcount-example.md | 119 ++++-----------
 6 files changed, 367 insertions(+), 329 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/_includes/header.html
----------------------------------------------------------------------
diff --git a/src/_includes/header.html b/src/_includes/header.html
index 6ffd1fe..d44e0dc 100644
--- a/src/_includes/header.html
+++ b/src/_includes/header.html
@@ -17,7 +17,8 @@
 		  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Get Started <span class="caret"></span></a>
 		  <ul class="dropdown-menu">
 			  <li><a href="{{ site.baseurl }}/get-started/beam-overview/">Beam Overview</a></li>
-              <li><a href="{{ site.baseurl }}/get-started/quickstart/">Quickstart</a></li>
+        <li><a href="{{ site.baseurl }}/get-started/quickstart-java/">Quickstart - Java</a></li>
+        <li><a href="{{ site.baseurl }}/get-started/quickstart-py/">Quickstart - Python</a></li>
 			  <li role="separator" class="divider"></li>
 			  <li class="dropdown-header">Example Walkthroughs</li>
 			  <li><a href="{{ site.baseurl }}/get-started/wordcount-example/">WordCount</a></li>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/get-started/beam-overview.md
----------------------------------------------------------------------
diff --git a/src/get-started/beam-overview.md b/src/get-started/beam-overview.md
index 79716a9..6796c2e 100644
--- a/src/get-started/beam-overview.md
+++ b/src/get-started/beam-overview.md
@@ -71,4 +71,8 @@ Beam currently supports Runners that work with the following distributed process
 
 ## Getting Started with Apache Beam
 
-Get started using Beam for your data processing tasks by following the [Quickstart]({{ site.baseurl }}/get-started/quickstart) and the [WordCount Examples Walkthrough]({{ site.baseurl }}/get-started/wordcount-example).
+Get started using Beam for your data processing tasks. 
+
+1. Follow the Quickstart for the [Java SDK]({{ site.baseurl }}/get-started/quickstart-java) or the [Python SDK]({{ site.baseurl }}/get-started/quickstart-py).
+
+2. See the [WordCount Examples Walkthrough]({{ site.baseurl }}/get-started/wordcount-example) for examples that introduce various features of the SDKs.

http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/get-started/quickstart-java.md
----------------------------------------------------------------------
diff --git a/src/get-started/quickstart-java.md b/src/get-started/quickstart-java.md
new file mode 100644
index 0000000..a97a61f
--- /dev/null
+++ b/src/get-started/quickstart-java.md
@@ -0,0 +1,236 @@
+---
+layout: default
+title: "Beam Quickstart for Java"
+permalink: /get-started/quickstart-java/
+redirect_from:
+  - /get-started/quickstart/
+  - /use/quickstart/
+  - /getting-started/
+---
+
+# Apache Beam Java SDK Quickstart
+
+This Quickstart will walk you through executing your first Beam pipeline to run [WordCount]({{ site.baseurl }}/get-started/wordcount-example), written using Beam's [Java SDK]({{ site.baseurl }}/documentation/sdks/java), on a [runner]({{ site.baseurl }}/documentation#runners) of your choice.
+
+* TOC
+{:toc}
+
+
+## Set up your Development Environment
+
+1. Download and install the [Java Development Kit (JDK)](http://www.oracle.com/technetwork/java/javase/downloads/index.html) version 1.7 or later. Verify that the [JAVA_HOME](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html) environment variable is set and points to your JDK installation.
+
+1. Download and install [Apache Maven](http://maven.apache.org/download.cgi) by following Maven's [installation guide](http://maven.apache.org/install.html) for your specific operating system.
+
+
+## Get the WordCount Code
+
+The easiest way to get a copy of the WordCount pipeline is to use the following command to generate a simple Maven project that contains Beam's WordCount examples and builds against the most recent Beam release:
+
+```
+$ mvn archetype:generate \
+      -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots \
+      -DarchetypeGroupId=org.apache.beam \
+      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
+      -DarchetypeVersion=LATEST \
+      -DgroupId=org.example \
+      -DartifactId=word-count-beam \
+      -Dversion="0.1" \
+      -Dpackage=org.apache.beam.examples \
+      -DinteractiveMode=false
+```
+
+This will create a directory `word-count-beam` that contains a simple `pom.xml` and a series of example pipelines that count words in text files.
+
+```
+$ cd word-count-beam/
+
+$ ls
+pom.xml	src
+
+$ ls src/main/java/org/apache/beam/examples/
+DebuggingWordCount.java	WindowedWordCount.java	common
+MinimalWordCount.java	WordCount.java
+```
+
+For a detailed introduction to the Beam concepts used in these examples, see the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example). Here, we'll just focus on executing `WordCount.java`.
+
+
+## Run WordCount
+
+A single Beam pipeline can run on multiple Beam [runners]({{ site.baseurl }}/documentation#runners), including the [ApexRunner]({{ site.baseurl }}/documentation/runners/apex), [FlinkRunner]({{ site.baseurl }}/documentation/runners/flink), [SparkRunner]({{ site.baseurl }}/documentation/runners/spark) or [DataflowRunner]({{ site.baseurl }}/documentation/runners/dataflow). The [DirectRunner]({{ site.baseurl }}/documentation/runners/direct) is a common runner for getting started, as it runs locally on your machine and requires no specific setup.
+
+After you've chosen which runner you'd like to use:
+
+1.  Ensure you've done any runner-specific setup.
+1.  Build your commandline by:
+    1. Specifying a specific runner with `--runner=<runner>` (defaults to the [DirectRunner]({{ site.baseurl }}/documentation/runners/direct))
+    1. Adding any runner-specific required options
+    1. Choosing input files and an output location are accessible on the chosen runner. (For example, you can't access a local file if you are running the pipeline on an external cluster.)
+1.  Run your first WordCount pipeline.
+
+{:.runner-direct}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
+```
+
+{:.runner-apex}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
+```
+
+{:.runner-flink-local}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
+```
+
+{:.runner-flink-cluster}
+```
+$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
+                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
+
+You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
+```
+
+{:.runner-spark}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
+```
+
+{:.runner-dataflow}
+```
+$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+	 -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
+	              --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
+     -Pdataflow-runner
+```
+
+
+## Inspect the results
+
+Once the pipeline has completed, you can view the output. You'll notice that there may be multiple output files prefixed by `count`. The exact number of these files is decided by the runner, giving it the flexibility to do efficient, distributed execution.
+
+{:.runner-direct}
+```
+$ ls counts*
+```
+
+{:.runner-apex}
+```
+$ ls counts*
+```
+
+{:.runner-flink-local}
+```
+$ ls counts*
+```
+
+{:.runner-flink-cluster}
+```
+$ ls /tmp/counts*
+```
+
+{:.runner-spark}
+```
+$ ls counts*
+```
+
+
+{:.runner-dataflow}
+```
+$ gsutil ls gs://<your-gcs-bucket>/counts*
+```
+
+When you look into the contents of the file, you'll see that they contain unique words and the number of occurrences of each word. The order of elements within the file may differ because the Beam model does not generally guarantee ordering, again to allow runners to optimize for efficiency.
+
+{:.runner-direct}
+```
+$ more counts*
+api: 9
+bundled: 1
+old: 4
+Apache: 2
+The: 1
+limitations: 1
+Foundation: 1
+...
+```
+
+{:.runner-apex}
+```
+$ cat counts*
+BEAM: 1
+have: 1
+simple: 1
+skip: 4
+PAssert: 1
+...
+```
+
+{:.runner-flink-local}
+```
+$ more counts*
+The: 1
+api: 9
+old: 4
+Apache: 2
+limitations: 1
+bundled: 1
+Foundation: 1
+...
+```
+
+{:.runner-flink-cluster}
+```
+$ more /tmp/counts*
+The: 1
+api: 9
+old: 4
+Apache: 2
+limitations: 1
+bundled: 1
+Foundation: 1
+...
+```
+
+{:.runner-spark}
+```
+$ more counts*
+beam: 27
+SF: 1
+fat: 1
+job: 1
+limitations: 1
+require: 1
+of: 11
+profile: 10
+...
+```
+
+{:.runner-dataflow}
+```
+$ gsutil cat gs://<your-gcs-bucket>/counts*
+feature: 15
+smother'st: 1
+revelry: 1
+bashfulness: 1
+Bashful: 1
+Below: 2
+deserves: 32
+barrenly: 1
+...
+```
+
+## Next Steps
+
+* Learn more about these WordCount examples in the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example).
+* Dive in to some of our favorite [articles and presentations]({{ site.baseurl }}/documentation/resources).
+* Join the Beam [users@]({{ site.baseurl }}/get-started/support#mailing-lists) mailing list.
+
+Please don't hesitate to [reach out]({{ site.baseurl }}/get-started/support) if you encounter any issues!
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/get-started/quickstart-py.md
----------------------------------------------------------------------
diff --git a/src/get-started/quickstart-py.md b/src/get-started/quickstart-py.md
new file mode 100644
index 0000000..a198eba
--- /dev/null
+++ b/src/get-started/quickstart-py.md
@@ -0,0 +1,97 @@
+---
+layout: default
+title: "Beam Quickstart for Python"
+permalink: /get-started/quickstart-py/
+---
+
+# Apache Beam Python SDK Quickstart
+
+This guide shows you how to set up your Python development environment, get the Apache Beam SDK for Python, and run an example pipeline.
+
+* TOC
+{:toc}
+
+## Set up your environment
+
+### Install pip
+
+Install [pip](https://pip.pypa.io/en/stable/installing/), Python's package manager. Check that you have version 7.0.0 or newer, by running: 
+
+```
+pip --version
+```
+
+### Install Python virtual environment 
+
+It is recommended that you install a [Python virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
+for initial experiments. If you do not have `virtualenv` version 13.1.0 or newer, install it by running:
+
+```
+pip install --upgrade virtualenv
+```
+
+If you do not want to use a Python virtual environment (not recommended), ensure `setuptools` is installed on your machine. If you do not have `setuptools` version 17.1 or newer, install it by running:
+
+```
+pip install --upgrade setuptools
+```
+
+## Get Apache Beam
+
+### Create and activate a virtual environment
+
+A virtual environment is a directory tree containing its own Python distribution. To create a virtual environment, create a directory and run:
+
+```
+virtualenv /path/to/directory
+```
+
+A virtual environment needs to be activated for each shell that is to use it.
+Activating it sets some environment variables that point to the virtual
+environment's directories. 
+
+To activate a virtual environment in Bash, run:
+
+```
+. /path/to/directory/bin/activate
+```
+
+That is, source the script `bin/activate` under the virtual environment directory you created.
+
+For instructions using other shells, see the [virtualenv documentation](https://virtualenv.pypa.io/en/stable/userguide/#activate-script).
+
+### Download and install
+
+1. Clone the Apache Beam repo from GitHub: 
+  `git clone https://github.com/apache/beam.git --branch python-sdk`
+
+2. Navigate to the `python` directory: 
+  `cd beam/sdks/python/`
+
+3. Create the Apache Beam Python SDK installation package: 
+  `python setup.py sdist`
+
+4. Navigate to the `dist` directory:
+  `cd dist/`
+
+5. Install the Apache Beam SDK
+  `pip install apache-beam-sdk-*.tar.gz`
+
+## Execute a pipeline locally
+
+The Apache Beam [examples](https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/examples) directory has many examples. All examples can be run locally by passing the required arguments described in the example script.
+
+For example, to run `wordcount.py`, run:
+
+```
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt --output output.txt
+```
+
+## Next Steps
+
+* Learn more about these WordCount examples in the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example).
+* Dive in to some of our favorite [articles and presentations]({{ site.baseurl }}/documentation/resources).
+* Join the Beam [users@]({{ site.baseurl }}/get-started/support#mailing-lists) mailing list.
+
+Please don't hesitate to [reach out]({{ site.baseurl }}/get-started/support) if you encounter any issues!
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/get-started/quickstart.md
----------------------------------------------------------------------
diff --git a/src/get-started/quickstart.md b/src/get-started/quickstart.md
deleted file mode 100644
index e10f9f3..0000000
--- a/src/get-started/quickstart.md
+++ /dev/null
@@ -1,235 +0,0 @@
----
-layout: default
-title: "Beam Quickstart"
-permalink: /get-started/quickstart/
-redirect_from:
-  - /use/quickstart/
-  - /getting-started/
----
-
-# Apache Beam Java SDK Quickstart
-
-This Quickstart will walk you through executing your first Beam pipeline to run [WordCount]({{ site.baseurl }}/get-started/wordcount-example), written using Beam's [Java SDK]({{ site.baseurl }}/documentation/sdks/java), on a [runner]({{ site.baseurl }}/documentation#runners) of your choice.
-
-* TOC
-{:toc}
-
-
-## Set up your Development Environment
-
-1. Download and install the [Java Development Kit (JDK)](http://www.oracle.com/technetwork/java/javase/downloads/index.html) version 1.7 or later. Verify that the [JAVA_HOME](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html) environment variable is set and points to your JDK installation.
-
-1. Download and install [Apache Maven](http://maven.apache.org/download.cgi) by following Maven's [installation guide](http://maven.apache.org/install.html) for your specific operating system.
-
-
-## Get the WordCount Code
-
-The easiest way to get a copy of the WordCount pipeline is to use the following command to generate a simple Maven project that contains Beam's WordCount examples and builds against the most recent Beam release:
-
-```
-$ mvn archetype:generate \
-      -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots \
-      -DarchetypeGroupId=org.apache.beam \
-      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-      -DarchetypeVersion=LATEST \
-      -DgroupId=org.example \
-      -DartifactId=word-count-beam \
-      -Dversion="0.1" \
-      -Dpackage=org.apache.beam.examples \
-      -DinteractiveMode=false
-```
-
-This will create a directory `word-count-beam` that contains a simple `pom.xml` and a series of example pipelines that count words in text files.
-
-```
-$ cd word-count-beam/
-
-$ ls
-pom.xml	src
-
-$ ls src/main/java/org/apache/beam/examples/
-DebuggingWordCount.java	WindowedWordCount.java	common
-MinimalWordCount.java	WordCount.java
-```
-
-For a detailed introduction to the Beam concepts used in these examples, see the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example). Here, we'll just focus on executing `WordCount.java`.
-
-
-## Run WordCount
-
-A single Beam pipeline can run on multiple Beam [runners]({{ site.baseurl }}/documentation#runners), including the [ApexRunner]({{ site.baseurl }}/documentation/runners/apex), [FlinkRunner]({{ site.baseurl }}/documentation/runners/flink), [SparkRunner]({{ site.baseurl }}/documentation/runners/spark) or [DataflowRunner]({{ site.baseurl }}/documentation/runners/dataflow). The [DirectRunner]({{ site.baseurl }}/documentation/runners/direct) is a common runner for getting started, as it runs locally on your machine and requires no specific setup.
-
-After you've chosen which runner you'd like to use:
-
-1.  Ensure you've done any runner-specific setup.
-1.  Build your commandline by:
-    1. Specifying a specific runner with `--runner=<runner>` (defaults to the [DirectRunner]({{ site.baseurl }}/documentation/runners/direct))
-    1. Adding any runner-specific required options
-    1. Choosing input files and an output location are accessible on the chosen runner. (For example, you can't access a local file if you are running the pipeline on an external cluster.)
-1.  Run your first WordCount pipeline.
-
-{:.runner-direct}
-```
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
-```
-
-{:.runner-apex}
-```
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner" -Papex-runner
-```
-
-{:.runner-flink-local}
-```
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner
-```
-
-{:.runner-flink-cluster}
-```
-$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> --filesToStage=target/word-count-beam-bundled-0.1.jar \
-                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner
-
-You can monitor the running job by visiting the Flink dashboard at http://<flink master>:8081
-```
-
-{:.runner-spark}
-```
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner
-```
-
-{:.runner-dataflow}
-```
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-	 -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
-	              --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-     -Pdataflow-runner
-```
-
-
-## Inspect the results
-
-Once the pipeline has completed, you can view the output. You'll notice that there may be multiple output files prefixed by `count`. The exact number of these files is decided by the runner, giving it the flexibility to do efficient, distributed execution.
-
-{:.runner-direct}
-```
-$ ls counts*
-```
-
-{:.runner-apex}
-```
-$ ls counts*
-```
-
-{:.runner-flink-local}
-```
-$ ls counts*
-```
-
-{:.runner-flink-cluster}
-```
-$ ls /tmp/counts*
-```
-
-{:.runner-spark}
-```
-$ ls counts*
-```
-
-
-{:.runner-dataflow}
-```
-$ gsutil ls gs://<your-gcs-bucket>/counts*
-```
-
-When you look into the contents of the file, you'll see that they contain unique words and the number of occurrences of each word. The order of elements within the file may differ because the Beam model does not generally guarantee ordering, again to allow runners to optimize for efficiency.
-
-{:.runner-direct}
-```
-$ more counts*
-api: 9
-bundled: 1
-old: 4
-Apache: 2
-The: 1
-limitations: 1
-Foundation: 1
-...
-```
-
-{:.runner-apex}
-```
-$ cat counts*
-BEAM: 1
-have: 1
-simple: 1
-skip: 4
-PAssert: 1
-...
-```
-
-{:.runner-flink-local}
-```
-$ more counts*
-The: 1
-api: 9
-old: 4
-Apache: 2
-limitations: 1
-bundled: 1
-Foundation: 1
-...
-```
-
-{:.runner-flink-cluster}
-```
-$ more /tmp/counts*
-The: 1
-api: 9
-old: 4
-Apache: 2
-limitations: 1
-bundled: 1
-Foundation: 1
-...
-```
-
-{:.runner-spark}
-```
-$ more counts*
-beam: 27
-SF: 1
-fat: 1
-job: 1
-limitations: 1
-require: 1
-of: 11
-profile: 10
-...
-```
-
-{:.runner-dataflow}
-```
-$ gsutil cat gs://<your-gcs-bucket>/counts*
-feature: 15
-smother'st: 1
-revelry: 1
-bashfulness: 1
-Bashful: 1
-Below: 2
-deserves: 32
-barrenly: 1
-...
-```
-
-## Next Steps
-
-* Learn more about these WordCount examples in the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example).
-* Dive in to some of our favorite [articles and presentations]({{ site.baseurl }}/documentation/resources).
-* Join the Beam [users@]({{ site.baseurl }}/get-started/support#mailing-lists) mailing list.
-
-Please don't hesitate to [reach out]({{ site.baseurl }}/get-started/support) if you encounter any issues!
-

http://git-wip-us.apache.org/repos/asf/beam-site/blob/852da20d/src/get-started/wordcount-example.md
----------------------------------------------------------------------
diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md
index a02b327..bf484b2 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -69,14 +69,8 @@ You can specify a runner for executing your pipeline, such as the `DataflowRunne
 ```
 
 ```py
-options = PipelineOptions()
-google_cloud_options = options.view_as(GoogleCloudOptions)
-google_cloud_options.project = 'my-project-id'
-google_cloud_options.job_name = 'myjob'
-google_cloud_options.staging_location = 'gs://your-bucket-name-here/staging'
-google_cloud_options.temp_location = 'gs://your-bucket-name-here/temp'
-options.view_as(StandardOptions).runner = 'BlockingDataflowPipelineRunner'
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_options
+%}```
 
 The next step is to create a Pipeline object with the options we've just constructed. The Pipeline object builds up the graph of transformations to be executed, associated with that particular pipeline.
 
@@ -85,8 +79,8 @@ Pipeline p = Pipeline.create(options);
 ```
 
 ```py
-p = beam.Pipeline(options=options)
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_create
+%}```
 
 ### Applying Pipeline Transforms
 
@@ -106,8 +100,8 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    p | beam.io.Read(beam.io.TextFileSource('gs://dataflow-samples/shakespeare/kinglear.txt'))
-    ```
+    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_read
+    %}```
 
 2.  A [ParDo]({{ site.baseurl }}/documentation/programming-guide/#transforms-pardo) transform that invokes a `DoFn` (defined in-line as an anonymous class) on each element that tokenizes the text lines into individual words. The input for this transform is the `PCollection` of text lines generated by the previous `TextIO.Read` transform. The `ParDo` transform outputs a new `PCollection`, where each element represents an individual word in the text.
 
@@ -126,8 +120,8 @@ The Minimal WordCount pipeline contains five transforms:
 
     ```py
     # The Flatmap transform is a simplified version of ParDo.
-    | 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
-    ```
+    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_pardo
+    %}```
 
 3.  The SDK-provided `Count` transform is a generic transform that takes a `PCollection` of any type, and returns a `PCollection` of key/value pairs. Each key represents a unique element from the input collection, and each value represents the number of times that key appeared in the input collection.
 
@@ -138,12 +132,12 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    | beam.combiners.Count.PerElement()
-    ```
+    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_count
+    %}```
 
 4.  The next transform formats each of the key/value pairs of unique words and occurrence counts into a printable string suitable for writing to an output file.
 
-	`MapElements` is a higher-level composite transform that encapsulates a simple `ParDo`; for each element in the input `PCollection`, `MapElements` applies a function that produces exactly one output element. In this example, `MapElements` invokes a `SimpleFunction` (defined in-line as an anonymous class) that does the formatting. As input, `MapElements` takes a `PCollection` of key/value pairs generated by `Count`, and produces a new `PCollection` of printable strings.
+	The map transform is a higher-level composite transform that encapsulates a simple `ParDo`; for each element in the input `PCollection`, the map transform applies a function that produces exactly one output element.
 
     ```java
     .apply("FormatResults", MapElements.via(new SimpleFunction<KV<String, Long>, String>() {
@@ -155,19 +149,19 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    | beam.Map(lambda (word, count): '%s: %s' % (word, count))
-    ```
+    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_map
+    %}```
 
-5.  A text file `Write`. This transform takes the final `PCollection` of formatted Strings as input and writes each element to an output text file. Each element in the input `PCollection` represents one line of text in the resulting output file.
+5.  A text file write transform. This transform takes the final `PCollection` of formatted Strings as input and writes each element to an output text file. Each element in the input `PCollection` represents one line of text in the resulting output file.
 
     ```java
     .apply(TextIO.Write.to("wordcounts"));
     ```
 
     ```py
-    | beam.io.Write(beam.io.TextFileSink('gs://my-bucket/counts.txt'))
-    ```
-
+    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_write
+    %}```
+    
 Note that the `Write` transform produces a trivial result value of type `PDone`, which in this case is ignored.
 
 ### Running the Pipeline
@@ -179,8 +173,8 @@ p.run().waitUntilFinish();
 ```
 
 ```py
-p.run()
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_run
+%}```
 
 Note that the `run` method is asynchronous. For a blocking execution instead, run your pipeline appending the `waitUntilFinish` method.
 
@@ -220,14 +214,8 @@ static class ExtractWordsFn extends DoFn<String, String> {
 ```py
 # In this example, the DoFns are defined as classes:
 
-class FormatAsTextFn(beam.DoFn):
-
-  def process(self, context):
-    word, count = context.element
-    yield '%s: %s' % (word, count)
-
-formatted = counts | beam.ParDo(FormatAsTextFn())
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_wordcount_dofn
+%}```
 
 ### Creating Composite Transforms
 
@@ -265,19 +253,8 @@ public static void main(String[] args) throws IOException {
 ```
 
 ```py
-class CountWords(beam.PTransform):
-
-  def apply(self, pcoll):
-    return (pcoll
-            # Convert lines of text into individual words.
-            | beam.FlatMap(
-                'ExtractWords', lambda x: re.findall(r'[A-Za-z\']+', x))
-
-            # Count the number of times each word occurs.
-            | beam.combiners.Count.PerElement())
-
-counts = lines | CountWords()
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_wordcount_composite
+%}```
 
 ### Using Parameterizable PipelineOptions
 
@@ -303,17 +280,8 @@ public static void main(String[] args) {
 ```
 
 ```py
-class WordCountOptions(PipelineOptions):
-
-  @classmethod
-  def _add_argparse_args(cls, parser):
-    parser.add_argument('--input',
-                        help='Input for the dataflow pipeline',
-                        default='gs://my-bucket/input')
-
-options = PipelineOptions(argv)
-p = beam.Pipeline(options=options)
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_wordcount_options
+%}```
 
 ## Debugging WordCount Example
 
@@ -362,41 +330,8 @@ public class DebuggingWordCount {
 ```
 
 ```py
-import logging
-
-class FilterTextFn(beam.DoFn):
-  """A DoFn that filters for a specific key based on a regular expression."""
-
-  # A custom aggregator can track values in your pipeline as it runs. Create
-  # custom aggregators matched_word and unmatched_words.
-  matched_words = beam.Aggregator('matched_words')
-  umatched_words = beam.Aggregator('umatched_words')
-
-  def __init__(self, pattern):
-    self.pattern = pattern
-
-  def process(self, context):
-    word, _ = context.element
-    if re.match(self.pattern, word):
-      # Log at INFO level each element we match. When executing this pipeline
-      # using the Dataflow service, these log lines will appear in the Cloud
-      # Logging UI.
-      logging.info('Matched %s', word)
-
-      # Add 1 to the custom aggregator matched_words
-      context.aggregate_to(self.matched_words, 1)
-      yield context.element
-    else:
-      # Log at the "DEBUG" level each element that is not matched. Different
-      # log levels can be used to control the verbosity of logging providing
-      # an effective mechanism to filter less important information. Note
-      # currently only "INFO" and higher level logs are emitted to the Cloud
-      # Logger. This log message will not be visible in the Cloud Logger.
-      logging.debug('Did not match %s', word)
-
-      # Add 1 to the custom aggregator umatched_words
-      context.aggregate_to(self.umatched_words, 1)
-```
+{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py tag:example_wordcount_debugging_logging
+%}```
 
 If you execute your pipeline using `DataflowRunner`, you can control the worker log levels. Dataflow workers that execute user code are configured to log to Cloud Logging by default at "INFO" log level and higher. You can override log levels for specific logging namespaces by specifying: `--workerLogLevelOverrides={"Name1":"Level1","Name2":"Level2",...}`. For example, by specifying `--workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}` when executing this pipeline using the Dataflow service, Cloud Logging would contain only "DEBUG" or higher level logs for the package in addition to the default "INFO" or higher level logs.