You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by dh...@apache.org on 2017/01/27 21:57:46 UTC

[1/5] beam-site git commit: [BEAM-278] Add I/O section to programming guide

Repository: beam-site
Updated Branches:
  refs/heads/asf-site 6a85cdf69 -> 6df417f99


[BEAM-278] Add I/O section to programming guide


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/884971d0
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/884971d0
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/884971d0

Branch: refs/heads/asf-site
Commit: 884971d0e0c0b365146bdda2260a55580442eb96
Parents: 6a85cdf
Author: melissa <me...@google.com>
Authored: Mon Nov 28 17:05:32 2016 -0800
Committer: Dan Halperin <dh...@google.com>
Committed: Fri Jan 27 13:39:30 2017 -0800

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 188 +++++++++++++++++++++++++++-
 1 file changed, 183 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/884971d0/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 869d9db..3851bb5 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -9,7 +9,7 @@ redirect_from:
 
 # Apache Beam Programming Guide
 
-The **Beam Programming Guide** is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. It is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your programs.
+The **Beam Programming Guide** is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. It is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines.
 
 
 <nav class="language-switcher">
@@ -39,7 +39,7 @@ The **Beam Programming Guide** is intended for Beam users who want to use the Be
   * [Using Flatten and Partition](#transforms-flatten-partition)
   * [General Requirements for Writing User Code for Beam Transforms](#transforms-usercodereqs)
   * [Side Inputs and Side Outputs](#transforms-sideio)
-* [I/O](#io)
+* [Pipeline I/O](#io)
 * [Running the Pipeline](#running)
 * [Data Encoding and Type Safety](#coders)
 * [Working with Windowing](#windowing)
@@ -124,7 +124,7 @@ public static void main(String[] args) {
     Pipeline p = Pipeline.create(options);
 
     PCollection<String> lines = p.apply(
-      TextIO.Read.named("ReadMyFile").from("protocol://path/to/some/inputData.txt"));
+      "ReadMyFile", TextIO.Read.from("protocol://path/to/some/inputData.txt"));
 }
 ```
 
@@ -942,8 +942,186 @@ While `ParDo` always produces a main output `PCollection` (as the return value f
 {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_undeclared
 %}```
 
-<a name="io"></a>
-<a name="running"></a>
+## <a name="io"></a>Pipeline I/O
+
+When you create a pipeline, you often need to read data from some external source, such as a file in external data sink or a database. Likewise, you may want your pipeline to output its result data to a similar external data sink. Beam provides read and write transforms for a number of common data storage types. If you want your pipeline to read from or write to a data storage format that isn't supported by the built-in transforms, you can implement your own read and write transforms.
+
+> A guide that covers how to implement your own Beam IO transforms is in progress ([BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025)).
+
+### Reading Input Data
+
+Read transforms read data from an external source and return a `PCollection` representation of the data for use by your pipeline. You can use a read transform at any point while constructing your pipeline to create a new `PCollection`, though it will be most common at the start of your pipeline. Here are examples of two common ways to read data.
+
+#### Reading from a `Source`:
+
+```java
+// A fully-specified Read from a GCS file:
+PCollection<Integer> numbers =
+  p.apply("ReadNumbers", TextIO.Read
+   .from("gs://my_bucket/path/to/numbers-*.txt")
+   .withCoder(TextualIntegerCoder.of()));
+```
+
+```python
+pipeline | beam.io.ReadFromText('protocol://path/to/some/inputData.txt')
+```
+
+Note that many sources use the builder pattern for setting options. For additional examples, see the language-specific documentation (such as Javadoc) for each of the sources.
+
+#### Using a read transform:
+
+```java
+// This example uses JmsIO.
+PCollection<JmsRecord> output =
+    pipeline.apply(JmsIO.read()
+        .withConnectionFactory(myConnectionFactory)
+        .withQueue("my-queue"))
+```
+
+```python
+pipeline | beam.io.textio.ReadFromText('my_file_name')
+```
+
+### Writing Output Data
+
+Write transforms write the data in a `PCollection` to an external data source. You will most often use write transforms at the end of your pipeline to output your pipeline's final results. However, you can use a write transform to output a `PCollection`'s data at any point in your pipeline. Here are examples of two common ways to write data.
+
+#### Writing to a `Sink`:
+
+```java
+// This example uses XmlSink.
+pipeline.apply(Write.to(
+          XmlSink.ofRecordClass(Type.class)
+              .withRootElementName(root_element)
+              .toFilenamePrefix(output_filename)));
+```
+
+```python
+output | beam.io.WriteToText('my_file_name')
+```
+
+#### Using a write transform:
+
+```java
+// This example uses JmsIO.
+pipeline.apply(...) // returns PCollection<String>
+        .apply(JmsIO.write()
+            .withConnectionFactory(myConnectionFactory)
+            .withQueue("my-queue")
+```
+
+```python
+output | beam.io.textio.WriteToText('my_file_name')
+```
+
+### File-based input and output data
+
+#### Reading From Multiple Locations:
+
+Many read transforms support reading from multiple input files matching a glob operator you provide. The following TextIO example uses a glob operator (\*) to read all matching input files that have prefix "input-" and the suffix ".csv" in the given location:
+
+```java
+p.apply(\u201cReadFromText\u201d,
+    TextIO.Read.from("protocol://my_bucket/path/to/input-*.csv");
+```
+
+```python
+lines = p | beam.io.Read(
+    'ReadFromText',
+    beam.io.TextFileSource('protocol://my_bucket/path/to/input-*.csv'))
+```
+
+To read data from disparate sources into a single `PCollection`, read each one independently and then use the `Flatten` transform to create a single `PCollection`.
+
+#### Writing To Multiple Output Files:
+
+For file-based output data, write transforms write to multiple output files by default. When you pass an output file name to a write transform, the file name is used as the prefix for all output files that the write transform produces. You can append a suffix to each output file by specifying a suffix.
+
+The following write transform example writes multiple output files to a location. Each file has the prefix "numbers", a numeric tag, and the suffix ".csv".
+
+```java
+records.apply(TextIO.Write.named("WriteToText")
+                          .to("protocol://my_bucket/path/to/numbers")
+                          .withSuffix(".csv"));
+```
+
+```python
+filtered_words | beam.io.WriteToText(
+'protocol://my_bucket/path/to/numbers', file_name_suffix='.csv')
+```
+
+### Beam provided I/O APIs
+
+See the language specific source code directories for the Beam supported I/O APIs. Specific documentation for each of these I/O sources will be added in the future. ([BEAM-1054](https://issues.apache.org/jira/browse/BEAM-1054))
+
+<table class="table table-bordered">
+<tr>
+  <th>Language</th>
+  <th>File-based</th>
+  <th>Messaging</th>
+  <th>Database</th>
+</tr>
+<tr>
+  <td>Java</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">HDFS</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Kafka</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Kinesis</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google Cloud PubSub</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google Cloud Bigtable</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google Cloud Datastore</a></p>
+  </td>
+</tr>
+<tr>
+  <td>Python</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/textio.py">textio</a></p>
+  </td>
+  <td>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/bigquery.py">Google BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/io/datastore">Google Cloud Datastore</a></p>
+  </td>
+
+</tr>
+</table>
+
+
+## <a name="running"></a>Running the Pipeline
+
+To run your pipeline, use the `run` method. The program you create sends a specification for your pipeline to a pipeline runner, which then constructs and runs the actual series of pipeline operations. Pipelines are executed asynchronously by default.
+
+```java
+pipeline.run();
+```
+
+```python
+pipeline.run()
+```
+
+For blocking execution, append the `waitUntilFinish` method:
+
+```java
+pipeline.run().waitUntilFinish();
+```
+
+```python
+# Not currently supported.
+```
+
 <a name="transforms-composite"></a>
 <a name="coders"></a>
 <a name="windowing"></a>


[3/5] beam-site git commit: Closes #121

Posted by dh...@apache.org.
Closes #121


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/9f22ab68
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/9f22ab68
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/9f22ab68

Branch: refs/heads/asf-site
Commit: 9f22ab686ab371fd50c5f50d563ef1a6bf2b87a3
Parents: 6a85cdf bc4d494
Author: Dan Halperin <dh...@google.com>
Authored: Fri Jan 27 13:42:18 2017 -0800
Committer: Dan Halperin <dh...@google.com>
Committed: Fri Jan 27 13:42:18 2017 -0800

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 209 +++++++++++++++++++++++-----
 1 file changed, 174 insertions(+), 35 deletions(-)
----------------------------------------------------------------------



[4/5] beam-site git commit: minor fixes

Posted by dh...@apache.org.
minor fixes


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f56cfa4b
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f56cfa4b
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f56cfa4b

Branch: refs/heads/asf-site
Commit: f56cfa4b826df6177c2163dab8bb3e620f939ad1
Parents: 9f22ab6
Author: Dan Halperin <dh...@google.com>
Authored: Fri Jan 27 13:54:35 2017 -0800
Committer: Dan Halperin <dh...@google.com>
Committed: Fri Jan 27 13:54:35 2017 -0800

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f56cfa4b/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index c0e947f..71ee487 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -15,7 +15,7 @@ The **Beam Programming Guide** is intended for Beam users who want to use the Be
 <nav class="language-switcher">
   <strong>Adapt for:</strong> 
   <ul>
-    <li data-type="language-java">Java SDK</li>
+    <li data-type="language-java" class="active">Java SDK</li>
     <li data-type="language-py">Python SDK</li>
   </ul>
 </nav>
@@ -1080,7 +1080,7 @@ pipeline.run().waitUntilFinish();
 ```
 
 ```python
-pipeline.run().wait_until_finish();
+pipeline.run().wait_until_finish()
 ```
 
 <a name="transforms-composite"></a>


[5/5] beam-site git commit: Regenerate website

Posted by dh...@apache.org.
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6df417f9
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6df417f9
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6df417f9

Branch: refs/heads/asf-site
Commit: 6df417f997f803db6139f2d2ec0a8b68a6a5517d
Parents: f56cfa4
Author: Dan Halperin <dh...@google.com>
Authored: Fri Jan 27 13:55:11 2017 -0800
Committer: Dan Halperin <dh...@google.com>
Committed: Fri Jan 27 13:55:11 2017 -0800

----------------------------------------------------------------------
 .../documentation/programming-guide/index.html  | 214 +++++++++++++++----
 1 file changed, 177 insertions(+), 37 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/6df417f9/content/documentation/programming-guide/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html
index 6b65ab7..83c9664 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -146,12 +146,12 @@
       <div class="row">
         <h1 id="apache-beam-programming-guide">Apache Beam Programming Guide</h1>
 
-<p>The <strong>Beam Programming Guide</strong> is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. It is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your programs.</p>
+<p>The <strong>Beam Programming Guide</strong> is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. It is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines.</p>
 
 <nav class="language-switcher">
   <strong>Adapt for:</strong> 
   <ul>
-    <li data-type="language-java">Java SDK</li>
+    <li data-type="language-java" class="active">Java SDK</li>
     <li data-type="language-py">Python SDK</li>
   </ul>
 </nav>
@@ -185,7 +185,7 @@
       <li><a href="#transforms-sideio">Side Inputs and Side Outputs</a></li>
     </ul>
   </li>
-  <li><a href="#io">I/O</a></li>
+  <li><a href="#io">Pipeline I/O</a></li>
   <li><a href="#running">Running the Pipeline</a></li>
   <li><a href="#coders">Data Encoding and Type Safety</a></li>
   <li><a href="#windowing">Working with Windowing</a></li>
@@ -225,7 +225,7 @@
 
 <p>When you run your Beam driver program, the Pipeline Runner that you designate constructs a <strong>workflow graph</strong> of your pipeline based on the <code class="highlighter-rouge">PCollection</code> objects you\u2019ve created and transforms that you\u2019ve applied. That graph is then executed using the appropriate distributed processing back-end, becoming an asynchronous \u201cjob\u201d (or equivalent) on that back-end.</p>
 
-<h2 id="a-namepipelineacreating-the-pipeline"><a name="pipeline"></a>Creating the Pipeline</h2>
+<h2 id="a-namepipelineacreating-the-pipeline"><a name="pipeline"></a>Creating the pipeline</h2>
 
 <p>The <code class="highlighter-rouge">Pipeline</code> abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/Pipeline.html">Pipeline</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/pipeline.py">Pipeline</a></span> object, and then using that object as the basis for creating the pipeline\u2019s data sets as <code class="highlighter-rouge">PCollection</code>s and its operations as <code class="highlighter-rouge">Transform</code>s.</p>
 
@@ -266,7 +266,7 @@
 
 <p>You create a <code class="highlighter-rouge">PCollection</code> by either reading data from an external source using Beam\u2019s <a href="#io">Source API</a>, or you can create a <code class="highlighter-rouge">PCollection</code> of data stored in an in-memory collection class in your driver program. The former is typically how a production pipeline would ingest data; Beam\u2019s Source APIs contain adapters to help you read from external sources like large cloud-based files, databases, or subscription services. The latter is primarily useful for testing and debugging purposes.</p>
 
-<h4 id="reading-from-an-external-source">Reading from an External Source</h4>
+<h4 id="reading-from-an-external-source">Reading from an external source</h4>
 
 <p>To read from an external source, you use one of the <a href="#io">Beam-provided I/O adapters</a>. The adapters vary in their exact usage, but all of them from some external data source and return a <code class="highlighter-rouge">PCollection</code> whose elements represent the data records in that source.</p>
 
@@ -279,7 +279,7 @@
     <span class="n">Pipeline</span> <span class="n">p</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">options</span><span class="o">);</span>
 
     <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
-      <span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">named</span><span class="o">(</span><span class="s">"ReadMyFile"</span><span class="o">).</span><span class="na">from</span><span class="o">(</span><span class="s">"protocol://path/to/some/inputData.txt"</span><span class="o">));</span>
+      <span class="s">"ReadMyFile"</span><span class="o">,</span> <span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="s">"protocol://path/to/some/inputData.txt"</span><span class="o">));</span>
 <span class="o">}</span>
 </code></pre>
 </div>
@@ -296,7 +296,7 @@
 
 <p>See the <a href="#io">section on I/O</a> to learn more about how to read from the various data sources supported by the Beam SDK.</p>
 
-<h4 id="creating-a-pcollection-from-in-memory-data">Creating a PCollection from In-Memory Data</h4>
+<h4 id="creating-a-pcollection-from-in-memory-data">Creating a PCollection from in-memory data</h4>
 
 <p class="language-java">To create a <code class="highlighter-rouge">PCollection</code> from an in-memory Java <code class="highlighter-rouge">Collection</code>, you use the Beam-provided <code class="highlighter-rouge">Create</code> transform. Much like a data adapter\u2019s <code class="highlighter-rouge">Read</code>, you apply <code class="highlighter-rouge">Create</code> directly to your <code class="highlighter-rouge">Pipeline</code> object itself.</p>
 
@@ -342,11 +342,11 @@
 </code></pre>
 </div>
 
-<h3 id="a-namepccharacteristicsapcollection-characteristics"><a name="pccharacteristics"></a>PCollection Characteristics</h3>
+<h3 id="a-namepccharacteristicsapcollection-characteristics"><a name="pccharacteristics"></a>PCollection characteristics</h3>
 
 <p>A <code class="highlighter-rouge">PCollection</code> is owned by the specific <code class="highlighter-rouge">Pipeline</code> object for which it is created; multiple pipelines cannot share a <code class="highlighter-rouge">PCollection</code>. In some respects, a <code class="highlighter-rouge">PCollection</code> functions like a collection class. However, a <code class="highlighter-rouge">PCollection</code> can differ in a few key ways:</p>
 
-<h4 id="a-namepcelementtypeaelement-type"><a name="pcelementtype"></a>Element Type</h4>
+<h4 id="a-namepcelementtypeaelement-type"><a name="pcelementtype"></a>Element type</h4>
 
 <p>The elements of a <code class="highlighter-rouge">PCollection</code> may be of any type, but must all be of the same type. However, to support distributed processing, Beam needs to be able to encode each individual element as a byte string (so elements can be passed around to distributed workers). The Beam SDKs provide a data encoding mechanism that includes built-in encoding for commonly-used types as well as support for specifying custom encodings as needed.</p>
 
@@ -354,11 +354,11 @@
 
 <p>A <code class="highlighter-rouge">PCollection</code> is immutable. Once created, you cannot add, remove, or change individual elements. A Beam Transform might process each element of a <code class="highlighter-rouge">PCollection</code> and generate new pipeline data (as a new <code class="highlighter-rouge">PCollection</code>), <em>but it does not consume or modify the original input collection</em>.</p>
 
-<h4 id="a-namepcrandomaccessarandom-access"><a name="pcrandomaccess"></a>Random Access</h4>
+<h4 id="a-namepcrandomaccessarandom-access"><a name="pcrandomaccess"></a>Random access</h4>
 
 <p>A <code class="highlighter-rouge">PCollection</code> does not support random access to individual elements. Instead, Beam Transforms consider every element in a <code class="highlighter-rouge">PCollection</code> individually.</p>
 
-<h4 id="a-namepcsizeboundasize-and-boundedness"><a name="pcsizebound"></a>Size and Boundedness</h4>
+<h4 id="a-namepcsizeboundasize-and-boundedness"><a name="pcsizebound"></a>Size and boundedness</h4>
 
 <p>A <code class="highlighter-rouge">PCollection</code> is a large, immutable \u201cbag\u201d of elements. There is no upper limit on how many elements a <code class="highlighter-rouge">PCollection</code> can contain; any given <code class="highlighter-rouge">PCollection</code> might fit in memory on a single machine, or it might represent a very large distributed data set backed by a persistent data store.</p>
 
@@ -368,7 +368,7 @@
 
 <p>When performing an operation that groups elements in an unbounded <code class="highlighter-rouge">PCollection</code>, Beam requires a concept called <strong>Windowing</strong> to divide a continuously updating data set into logical windows of finite size.  Beam processes each window as a bundle, and processing continues as the data set is generated. These logical windows are determined by some characteristic associated with a data element, such as a <strong>timestamp</strong>.</p>
 
-<h4 id="a-namepctimestampsaelement-timestamps"><a name="pctimestamps"></a>Element Timestamps</h4>
+<h4 id="a-namepctimestampsaelement-timestamps"><a name="pctimestamps"></a>Element timestamps</h4>
 
 <p>Each element in a <code class="highlighter-rouge">PCollection</code> has an associated intrinsic <strong>timestamp</strong>. The timestamp for each element is initially assigned by the <a href="#io">Source</a> that creates the <code class="highlighter-rouge">PCollection</code>. Sources that create an unbounded <code class="highlighter-rouge">PCollection</code> often assign each new element a timestamp that corresponds to when the element was read or added.</p>
 
@@ -380,7 +380,7 @@
 
 <p>You can manually assign timestamps to the elements of a <code class="highlighter-rouge">PCollection</code> if the source doesn\u2019t do it for you. You\u2019ll want to do this if the elements have an inherent timestamp, but the timestamp is somewhere in the structure of the element itself (such as a \u201ctime\u201d field in a server log entry). Beam has <a href="#transforms">Transforms</a> that take a <code class="highlighter-rouge">PCollection</code> as input and output an identical <code class="highlighter-rouge">PCollection</code> with timestamps attached; see <a href="#windowing">Assigning Timestamps</a> for more information on how to do so.</p>
 
-<h2 id="a-nametransformsaapplying-transforms"><a name="transforms"></a>Applying Transforms</h2>
+<h2 id="a-nametransformsaapplying-transforms"><a name="transforms"></a>Applying transforms</h2>
 
 <p>In the Beam SDKs, <strong>transforms</strong> are the operations in your pipeline. A transform takes a <code class="highlighter-rouge">PCollection</code> (or more than one <code class="highlighter-rouge">PCollection</code>) as input, performs an operation that you specify on each element in that collection, and produces a new output <code class="highlighter-rouge">PCollection</code>. To invoke a transform, you must <strong>apply</strong> it to the input <code class="highlighter-rouge">PCollection</code>.</p>
 
@@ -431,7 +431,7 @@
 
 <p>The transforms in the Beam SDKs provide a generic <strong>processing framework</strong>, where you provide processing logic in the form of a function object (colloquially referred to as \u201cuser code\u201d). The user code gets applied to the elements of the input <code class="highlighter-rouge">PCollection</code>. Instances of your user code might then be executed in parallel by many different workers across a cluster, depending on the pipeline runner and back-end that you choose to execute your Beam pipeline. The user code running on each worker generates the output elements that are ultimately added to the final output <code class="highlighter-rouge">PCollection</code> that the transform produces.</p>
 
-<h3 id="core-beam-transforms">Core Beam Transforms</h3>
+<h3 id="core-beam-transforms">Core Beam transforms</h3>
 
 <p>Beam provides the following transforms, each of which represents a different processing paradigm:</p>
 
@@ -551,7 +551,7 @@
   <li>Once you output a value using <code class="highlighter-rouge">ProcessContext.output()</code> or <code class="highlighter-rouge">ProcessContext.sideOutput()</code>, you should not modify that value in any way.</li>
 </ul>
 
-<h5 id="lightweight-dofns-and-other-abstractions">Lightweight DoFns and Other Abstractions</h5>
+<h5 id="lightweight-dofns-and-other-abstractions">Lightweight DoFns and other abstractions</h5>
 
 <p>If your function is relatively straightforward, you can simplify your use of <code class="highlighter-rouge">ParDo</code> by providing a lightweight <code class="highlighter-rouge">DoFn</code> in-line, as <span class="language-java">an anonymous inner class instance</span><span class="language-py">a lambda function</span>.</p>
 
@@ -563,9 +563,8 @@
 <span class="c1">// Apply a ParDo with an anonymous DoFn to the PCollection words.</span>
 <span class="c1">// Save the result as the PCollection wordLengths.</span>
 <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">wordLengths</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
-  <span class="n">ParDo</span>
-    <span class="o">.</span><span class="na">named</span><span class="o">(</span><span class="s">"ComputeWordLengths"</span><span class="o">)</span>            <span class="c1">// the transform name</span>
-    <span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>       <span class="c1">// a DoFn as an anonymous inner class instance</span>
+  <span class="s">"ComputeWordLengths"</span><span class="o">,</span>                     <span class="c1">// the transform name</span>
+  <span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>    <span class="c1">// a DoFn as an anonymous inner class instance</span>
       <span class="nd">@ProcessElement</span>
       <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span>
         <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">().</span><span class="na">length</span><span class="o">());</span>
@@ -660,7 +659,7 @@ tree, [2]
 
 <p>Simple combine operations, such as sums, can usually be implemented as a simple function. More complex combination operations might require you to create a subclass of <code class="highlighter-rouge">CombineFn</code> that has an accumulation type distinct from the input/output type.</p>
 
-<h5 id="simple-combinations-using-simple-functions"><strong>Simple Combinations Using Simple Functions</strong></h5>
+<h5 id="simple-combinations-using-simple-functions"><strong>Simple combinations using simple functions</strong></h5>
 
 <p>The following example code shows a simple combine function.</p>
 
@@ -687,7 +686,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="advanced-combinations-using-combinefn"><strong>Advanced Combinations using CombineFn</strong></h5>
+<h5 id="advanced-combinations-using-combinefn"><strong>Advanced combinations using CombineFn</strong></h5>
 
 <p>For more complex combine functions, you can define a subclass of <code class="highlighter-rouge">CombineFn</code>. You should use <code class="highlighter-rouge">CombineFn</code> if the combine function requires a more sophisticated accumulator, must perform additional pre- or post-processing, might change the output type, or takes the key into account.</p>
 
@@ -764,7 +763,7 @@ tree, [2]
 
 <p>If you are combining a <code class="highlighter-rouge">PCollection</code> of key-value pairs, <a href="#transforms-combine-per-key">per-key combining</a> is often enough. If you need the combining strategy to change based on the key (for example, MIN for some users and MAX for other users), you can define a <code class="highlighter-rouge">KeyedCombineFn</code> to access the key within the combining strategy.</p>
 
-<h5 id="combining-a-pcollection-into-a-single-value"><strong>Combining a PCollection into a Single Value</strong></h5>
+<h5 id="combining-a-pcollection-into-a-single-value"><strong>Combining a PCollection into a single value</strong></h5>
 
 <p>Use the global combine to transform all of the elements in a given <code class="highlighter-rouge">PCollection</code> into a single value, represented in your pipeline as a new <code class="highlighter-rouge">PCollection</code> containing one element. The following example code shows how to apply the Beam provided sum combine function to produce a single sum value for a <code class="highlighter-rouge">PCollection</code> of integers.</p>
 
@@ -783,7 +782,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="global-windowing">Global Windowing:</h5>
+<h5 id="global-windowing">Global windowing:</h5>
 
 <p>If your input <code class="highlighter-rouge">PCollection</code> uses the default global windowing, the default behavior is to return a <code class="highlighter-rouge">PCollection</code> containing one item. That item\u2019s value comes from the accumulator in the combine function that you specified when applying <code class="highlighter-rouge">Combine</code>. For example, the Beam provided sum combine function returns a zero value (the sum of an empty input), while the min combine function returns a maximal or infinite value.</p>
 
@@ -801,7 +800,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="non-global-windowing">Non-Global Windowing:</h5>
+<h5 id="non-global-windowing">Non-global windowing:</h5>
 
 <p>If your <code class="highlighter-rouge">PCollection</code> uses any non-global windowing function, Beam does not provide the default behavior. You must specify one of the following options when applying <code class="highlighter-rouge">Combine</code>:</p>
 
@@ -810,7 +809,7 @@ tree, [2]
   <li>Specify <code class="highlighter-rouge">.asSingletonView</code>, in which the output is immediately converted to a <code class="highlighter-rouge">PCollectionView</code>, which will provide a default value for each empty window when used as a side input. You\u2019ll generally only need to use this option if the result of your pipeline\u2019s <code class="highlighter-rouge">Combine</code> is to be used as a side input later in the pipeline.</li>
 </ul>
 
-<h5 id="a-nametransforms-combine-per-keyacombining-values-in-a-key-grouped-collection"><a name="transforms-combine-per-key"></a><strong>Combining Values in a Key-Grouped Collection</strong></h5>
+<h5 id="a-nametransforms-combine-per-keyacombining-values-in-a-key-grouped-collection"><a name="transforms-combine-per-key"></a><strong>Combining values in a key-grouped collection</strong></h5>
 
 <p>After creating a key-grouped collection (for example, by using a <code class="highlighter-rouge">GroupByKey</code> transform) a common pattern is to combine the collection of values associated with each key into a single, merged value. Drawing on the previous example from <code class="highlighter-rouge">GroupByKey</code>, a key-grouped <code class="highlighter-rouge">PCollection</code> called <code class="highlighter-rouge">groupedWords</code> looks like this:</p>
 
@@ -877,11 +876,11 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="data-encoding-in-merged-collections">Data Encoding in Merged Collections:</h5>
+<h5 id="data-encoding-in-merged-collections">Data encoding in merged collections:</h5>
 
 <p>By default, the coder for the output <code class="highlighter-rouge">PCollection</code> is the same as the coder for the first <code class="highlighter-rouge">PCollection</code> in the input <code class="highlighter-rouge">PCollectionList</code>. However, the input <code class="highlighter-rouge">PCollection</code> objects can each use different coders, as long as they all contain the same data type in your chosen language.</p>
 
-<h5 id="merging-windowed-collections">Merging Windowed Collections:</h5>
+<h5 id="merging-windowed-collections">Merging windowed collections:</h5>
 
 <p>When using <code class="highlighter-rouge">Flatten</code> to merge <code class="highlighter-rouge">PCollection</code> objects that have a windowing strategy applied, all of the <code class="highlighter-rouge">PCollection</code> objects you want to merge must use a compatible windowing strategy and window sizing. For example, all the collections you\u2019re merging must all use (hypothetically) identical 5-minute fixed windows or 4-minute sliding windows starting every 30 seconds.</p>
 
@@ -922,7 +921,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h4 id="a-nametransforms-usercodereqsageneral-requirements-for-writing-user-code-for-beam-transforms"><a name="transforms-usercodereqs"></a>General Requirements for Writing User Code for Beam Transforms</h4>
+<h4 id="a-nametransforms-usercodereqsageneral-requirements-for-writing-user-code-for-beam-transforms"><a name="transforms-usercodereqs"></a>General Requirements for writing user code for Beam transforms</h4>
 
 <p>When you build user code for a Beam transform, you should keep in mind the distributed nature of execution. For example, there might be many copies of your function running on a lot of different machines in parallel, and those copies function independently, without communicating or sharing state with any of the other copies. Depending on the Pipeline Runner and processing back-end you choose for your pipeline, each copy of your user code function may be retried or run multiple times. As such, you should be cautious about including things like state dependency in your user code.</p>
 
@@ -953,7 +952,7 @@ tree, [2]
   <li>Take care when declaring your function object inline by using an anonymous inner class instance. In a non-static context, your inner class instance will implicitly contain a pointer to the enclosing class and that class\u2019 state. That enclosing class will also be serialized, and thus the same considerations that apply to the function object itself also apply to this outer class.</li>
 </ul>
 
-<h5 id="thread-compatibility">Thread-Compatibility</h5>
+<h5 id="thread-compatibility">Thread-compatibility</h5>
 
 <p>Your function object should be thread-compatible. Each instance of your function object is accessed by a single thread on a worker instance, unless you explicitly create your own threads. Note, however, that <strong>the Beam SDKs are not thread-safe</strong>. If you create your own threads in your user code, you must provide your own synchronization. Note that static members in your function object are not passed to worker instances and that multiple instances of your function may be accessed from different threads.</p>
 
@@ -963,13 +962,13 @@ tree, [2]
 
 <h4 id="a-nametransforms-sideioaside-inputs-and-side-outputs"><a name="transforms-sideio"></a>Side Inputs and Side Outputs</h4>
 
-<h5 id="side-inputs"><strong>Side Inputs</strong></h5>
+<h5 id="side-inputs"><strong>Side inputs</strong></h5>
 
 <p>In addition to the main input <code class="highlighter-rouge">PCollection</code>, you can provide additional inputs to a <code class="highlighter-rouge">ParDo</code> transform in the form of side inputs. A side input is an additional input that your <code class="highlighter-rouge">DoFn</code> can access each time it processes an element in the input <code class="highlighter-rouge">PCollection</code>. When you specify a side input, you create a view of some other data that can be read from within the <code class="highlighter-rouge">ParDo</code> transform\u2019s <code class="highlighter-rouge">DoFn</code> while procesing each element.</p>
 
 <p>Side inputs are useful if your <code class="highlighter-rouge">ParDo</code> needs to inject additional data when processing each element in the input <code class="highlighter-rouge">PCollection</code>, but the additional data needs to be determined at runtime (and not hard-coded). Such values might be determined by the input data, or depend on a different branch of your pipeline.</p>
 
-<h5 id="passing-side-inputs-to-pardo">Passing Side Inputs to ParDo:</h5>
+<h5 id="passing-side-inputs-to-pardo">Passing side inputs to ParDo:</h5>
 
 <div class="language-java highlighter-rouge"><pre class="highlight"><code>  <span class="c1">// Pass side inputs to your ParDo transform by invoking .withSideInputs.</span>
   <span class="c1">// Inside your DoFn, access the side input by using the method DoFn.ProcessContext.sideInput.</span>
@@ -1045,7 +1044,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="side-inputs-and-windowing">Side Inputs and Windowing:</h5>
+<h5 id="side-inputs-and-windowing">Side inputs and windowing:</h5>
 
 <p>A windowed <code class="highlighter-rouge">PCollection</code> may be infinite and thus cannot be compressed into a single value (or single collection class). When you create a <code class="highlighter-rouge">PCollectionView</code> of a windowed <code class="highlighter-rouge">PCollection</code>, the <code class="highlighter-rouge">PCollectionView</code> represents a single entity per window (one singleton per window, one list per window, etc.).</p>
 
@@ -1057,11 +1056,11 @@ tree, [2]
 
 <p>If the side input has multiple trigger firings, Beam uses the value from the latest trigger firing. This is particularly useful if you use a side input with a single global window and specify a trigger.</p>
 
-<h5 id="side-outputs"><strong>Side Outputs</strong></h5>
+<h5 id="side-outputs"><strong>Side outputs</strong></h5>
 
 <p>While <code class="highlighter-rouge">ParDo</code> always produces a main output <code class="highlighter-rouge">PCollection</code> (as the return value from apply), you can also have your <code class="highlighter-rouge">ParDo</code> produce any number of additional output <code class="highlighter-rouge">PCollection</code>s. If you choose to have multiple outputs, your <code class="highlighter-rouge">ParDo</code> returns all of the output <code class="highlighter-rouge">PCollection</code>s (including the main output) bundled together.</p>
 
-<h5 id="tags-for-side-outputs">Tags for Side Outputs:</h5>
+<h5 id="tags-for-side-outputs">Tags for side outputs:</h5>
 
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// To emit elements to a side output PCollection, create a TupleTag object to identify each collection that your ParDo produces.</span>
 <span class="c1">// For example, if your ParDo produces three output PCollections (the main output and two side outputs), you must create three TupleTags.</span>
@@ -1133,7 +1132,7 @@ tree, [2]
 </code></pre>
 </div>
 
-<h5 id="emitting-to-side-outputs-in-your-dofn">Emitting to Side Outputs in your DoFn:</h5>
+<h5 id="emitting-to-side-outputs-in-your-dofn">Emitting to side outputs in your DoFn:</h5>
 
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Inside your ParDo's DoFn, you can emit an element to a side output by using the method ProcessContext.sideOutput.</span>
 <span class="c1">// Pass the appropriate TupleTag for the target side output collection when you call ProcessContext.sideOutput.</span>
@@ -1194,9 +1193,150 @@ tree, [2]
 </code></pre>
 </div>
 
-<p><a name="io"></a>
-<a name="running"></a>
-<a name="transforms-composite"></a>
+<h2 id="a-nameioapipeline-io"><a name="io"></a>Pipeline I/O</h2>
+
+<p>When you create a pipeline, you often need to read data from some external source, such as a file in external data sink or a database. Likewise, you may want your pipeline to output its result data to a similar external data sink. Beam provides read and write transforms for a number of common data storage types. If you want your pipeline to read from or write to a data storage format that isn\u2019t supported by the built-in transforms, you can implement your own read and write transforms.</p>
+
+<blockquote>
+  <p>A guide that covers how to implement your own Beam IO transforms is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-1025">BEAM-1025</a>).</p>
+</blockquote>
+
+<h3 id="reading-input-data">Reading input data</h3>
+
+<p>Read transforms read data from an external source and return a <code class="highlighter-rouge">PCollection</code> representation of the data for use by your pipeline. You can use a read transform at any point while constructing your pipeline to create a new <code class="highlighter-rouge">PCollection</code>, though it will be most common at the start of your pipeline.</p>
+
+<h4 id="using-a-read-transform">Using a read transform:</h4>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="s">"gs://some/inputData.txt"</span><span class="o">));</span>   
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">lines</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="s">'gs://some/inputData.txt'</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<h3 id="writing-output-data">Writing output data</h3>
+
+<p>Write transforms write the data in a <code class="highlighter-rouge">PCollection</code> to an external data source. You will most often use write transforms at the end of your pipeline to output your pipeline\u2019s final results. However, you can use a write transform to output a <code class="highlighter-rouge">PCollection</code>\u2019s data at any point in your pipeline.</p>
+
+<h4 id="using-a-write-transform">Using a Write transform:</h4>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">output</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span class="na">Write</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"gs://some/outputData"</span><span class="o">));</span>
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">output</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToText</span><span class="p">(</span><span class="s">'gs://some/outputData'</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<h3 id="file-based-input-and-output-data">File-based input and output data</h3>
+
+<h4 id="reading-from-multiple-locations">Reading from multiple locations:</h4>
+
+<p>Many read transforms support reading from multiple input files matching a glob operator you provide. Note that glob operators are filesystem-specific and obey filesystem-specific consistency models. The following TextIO example uses a glob operator (*) to read all matching input files that have prefix \u201cinput-\u201c and the suffix \u201c.csv\u201d in the given location:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="err">\u201c</span><span class="n">ReadFromText</span><span class="err">\u201d</span><span class="o">,</span>
+    <span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="s">"protocol://my_bucket/path/to/input-*.csv"</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">lines</span> <span class="o">=</span> <span class="n">p</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span>
+    <span class="s">'ReadFromText'</span><span class="p">,</span>
+    <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">TextFileSource</span><span class="p">(</span><span class="s">'protocol://my_bucket/path/to/input-*.csv'</span><span class="p">))</span>
+</code></pre>
+</div>
+
+<p>To read data from disparate sources into a single <code class="highlighter-rouge">PCollection</code>, read each one independently and then use the <a href="#transforms-flatten-partition">Flatten</a> transform to create a single <code class="highlighter-rouge">PCollection</code>.</p>
+
+<h4 id="writing-to-multiple-output-files">Writing to multiple output files:</h4>
+
+<p>For file-based output data, write transforms write to multiple output files by default. When you pass an output file name to a write transform, the file name is used as the prefix for all output files that the write transform produces. You can append a suffix to each output file by specifying a suffix.</p>
+
+<p>The following write transform example writes multiple output files to a location. Each file has the prefix \u201cnumbers\u201d, a numeric tag, and the suffix \u201c.csv\u201d.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">records</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="s">"WriteToText"</span><span class="o">,</span>
+    <span class="n">TextIO</span><span class="o">.</span><span class="na">Write</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"protocol://my_bucket/path/to/numbers"</span><span class="o">)</span>
+                <span class="o">.</span><span class="na">withSuffix</span><span class="o">(</span><span class="s">".csv"</span><span class="o">));</span>
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">filtered_words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToText</span><span class="p">(</span>
+<span class="s">'protocol://my_bucket/path/to/numbers'</span><span class="p">,</span> <span class="n">file_name_suffix</span><span class="o">=</span><span class="s">'.csv'</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<h3 id="beam-provided-io-apis">Beam-provided I/O APIs</h3>
+
+<p>See the language specific source code directories for the Beam supported I/O APIs. Specific documentation for each of these I/O sources will be added in the future. (<a href="https://issues.apache.org/jira/browse/BEAM-1054">BEAM-1054</a>)</p>
+
+<table class="table table-bordered">
+<tr>
+  <th>Language</th>
+  <th>File-based</th>
+  <th>Messaging</th>
+  <th>Database</th>
+</tr>
+<tr>
+  <td>Java</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">HDFS</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Kafka</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Kinesis</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google Cloud PubSub</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google Cloud Bigtable</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google Cloud Datastore</a></p>
+  </td>
+</tr>
+<tr>
+  <td>Python</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/textio.py">textio</a></p>
+  </td>
+  <td>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/bigquery.py">Google BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/io/datastore">Google Cloud Datastore</a></p>
+  </td>
+
+</tr>
+</table>
+
+<h2 id="a-namerunningarunning-the-pipeline"><a name="running"></a>Running the pipeline</h2>
+
+<p>To run your pipeline, use the <code class="highlighter-rouge">run</code> method. The program you create sends a specification for your pipeline to a pipeline runner, which then constructs and runs the actual series of pipeline operations. Pipelines are executed asynchronously by default.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">pipeline</span><span class="o">.</span><span class="na">run</span><span class="o">();</span>
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">pipeline</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
+</code></pre>
+</div>
+
+<p>For blocking execution, append the <code class="highlighter-rouge">waitUntilFinish</code> method:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">pipeline</span><span class="o">.</span><span class="na">run</span><span class="o">().</span><span class="na">waitUntilFinish</span><span class="o">();</span>
+</code></pre>
+</div>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">pipeline</span><span class="o">.</span><span class="n">run</span><span class="p">()</span><span class="o">.</span><span class="n">wait_until_finish</span><span class="p">()</span>
+</code></pre>
+</div>
+
+<p><a name="transforms-composite"></a>
 <a name="coders"></a>
 <a name="windowing"></a>
 <a name="triggers"></a></p>


[2/5] beam-site git commit: Update according to reviewer comments

Posted by dh...@apache.org.
Update according to reviewer comments


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/bc4d4947
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/bc4d4947
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/bc4d4947

Branch: refs/heads/asf-site
Commit: bc4d4947461244eee83d7c0f31e76cda9971b5cb
Parents: 884971d
Author: Hadar Hod <ha...@google.com>
Authored: Tue Jan 24 23:07:48 2017 -0800
Committer: Dan Halperin <dh...@google.com>
Committed: Fri Jan 27 13:42:01 2017 -0800

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 135 ++++++++++------------------
 1 file changed, 48 insertions(+), 87 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/bc4d4947/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 3851bb5..c0e947f 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -69,7 +69,7 @@ A typical Beam driver program works as follows:
 
 When you run your Beam driver program, the Pipeline Runner that you designate constructs a **workflow graph** of your pipeline based on the `PCollection` objects you've created and transforms that you've applied. That graph is then executed using the appropriate distributed processing back-end, becoming an asynchronous "job" (or equivalent) on that back-end.
 
-## <a name="pipeline"></a>Creating the Pipeline
+## <a name="pipeline"></a>Creating the pipeline
 
 The `Pipeline` abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a <span class="language-java">[Pipeline]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/Pipeline.html)</span><span class="language-py">[Pipeline](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/pipeline.py)</span> object, and then using that object as the basis for creating the pipeline's data sets as `PCollection`s and its operations as `Transform`s.
 
@@ -110,7 +110,7 @@ After you've created your `Pipeline`, you'll need to begin by creating at least
 
 You create a `PCollection` by either reading data from an external source using Beam's [Source API](#io), or you can create a `PCollection` of data stored in an in-memory collection class in your driver program. The former is typically how a production pipeline would ingest data; Beam's Source APIs contain adapters to help you read from external sources like large cloud-based files, databases, or subscription services. The latter is primarily useful for testing and debugging purposes.
 
-#### Reading from an External Source
+#### Reading from an external source
 
 To read from an external source, you use one of the [Beam-provided I/O adapters](#io). The adapters vary in their exact usage, but all of them from some external data source and return a `PCollection` whose elements represent the data records in that source. 
 
@@ -141,7 +141,7 @@ lines = p | 'ReadMyFile' >> beam.io.Read(beam.io.TextFileSource("protocol://path
 
 See the [section on I/O](#io) to learn more about how to read from the various data sources supported by the Beam SDK.
 
-#### Creating a PCollection from In-Memory Data
+#### Creating a PCollection from in-memory data
 
 {:.language-java}
 To create a `PCollection` from an in-memory Java `Collection`, you use the Beam-provided `Create` transform. Much like a data adapter's `Read`, you apply `Create` directly to your `Pipeline` object itself.
@@ -190,11 +190,11 @@ p = beam.Pipeline()
 collection = p | 'ReadMyLines' >> beam.Create(lines)
 ```
 
-### <a name="pccharacteristics"></a>PCollection Characteristics
+### <a name="pccharacteristics"></a>PCollection characteristics
 
 A `PCollection` is owned by the specific `Pipeline` object for which it is created; multiple pipelines cannot share a `PCollection`. In some respects, a `PCollection` functions like a collection class. However, a `PCollection` can differ in a few key ways:
 
-#### <a name="pcelementtype"></a>Element Type
+#### <a name="pcelementtype"></a>Element type
 
 The elements of a `PCollection` may be of any type, but must all be of the same type. However, to support distributed processing, Beam needs to be able to encode each individual element as a byte string (so elements can be passed around to distributed workers). The Beam SDKs provide a data encoding mechanism that includes built-in encoding for commonly-used types as well as support for specifying custom encodings as needed.
 
@@ -202,11 +202,11 @@ The elements of a `PCollection` may be of any type, but must all be of the same
 
 A `PCollection` is immutable. Once created, you cannot add, remove, or change individual elements. A Beam Transform might process each element of a `PCollection` and generate new pipeline data (as a new `PCollection`), *but it does not consume or modify the original input collection*.
 
-#### <a name="pcrandomaccess"></a>Random Access
+#### <a name="pcrandomaccess"></a>Random access
 
 A `PCollection` does not support random access to individual elements. Instead, Beam Transforms consider every element in a `PCollection` individually.
 
-#### <a name="pcsizebound"></a>Size and Boundedness
+#### <a name="pcsizebound"></a>Size and boundedness
 
 A `PCollection` is a large, immutable "bag" of elements. There is no upper limit on how many elements a `PCollection` can contain; any given `PCollection` might fit in memory on a single machine, or it might represent a very large distributed data set backed by a persistent data store.
 
@@ -216,7 +216,7 @@ The bounded (or unbounded) nature of your `PCollection` affects how Beam process
 
 When performing an operation that groups elements in an unbounded `PCollection`, Beam requires a concept called **Windowing** to divide a continuously updating data set into logical windows of finite size.  Beam processes each window as a bundle, and processing continues as the data set is generated. These logical windows are determined by some characteristic associated with a data element, such as a **timestamp**.
 
-#### <a name="pctimestamps"></a>Element Timestamps
+#### <a name="pctimestamps"></a>Element timestamps
 
 Each element in a `PCollection` has an associated intrinsic **timestamp**. The timestamp for each element is initially assigned by the [Source](#io) that creates the `PCollection`. Sources that create an unbounded `PCollection` often assign each new element a timestamp that corresponds to when the element was read or added.
 
@@ -226,7 +226,7 @@ Timestamps are useful for a `PCollection` that contains elements with an inheren
 
 You can manually assign timestamps to the elements of a `PCollection` if the source doesn't do it for you. You'll want to do this if the elements have an inherent timestamp, but the timestamp is somewhere in the structure of the element itself (such as a "time" field in a server log entry). Beam has [Transforms](#transforms) that take a `PCollection` as input and output an identical `PCollection` with timestamps attached; see [Assigning Timestamps](#windowing) for more information on how to do so.
 
-## <a name="transforms"></a>Applying Transforms
+## <a name="transforms"></a>Applying transforms
 
 In the Beam SDKs, **transforms** are the operations in your pipeline. A transform takes a `PCollection` (or more than one `PCollection`) as input, performs an operation that you specify on each element in that collection, and produces a new output `PCollection`. To invoke a transform, you must **apply** it to the input `PCollection`.
 
@@ -277,7 +277,7 @@ You can also build your own [composite transforms](#transforms-composite) that n
 
 The transforms in the Beam SDKs provide a generic **processing framework**, where you provide processing logic in the form of a function object (colloquially referred to as "user code"). The user code gets applied to the elements of the input `PCollection`. Instances of your user code might then be executed in parallel by many different workers across a cluster, depending on the pipeline runner and back-end that you choose to execute your Beam pipeline. The user code running on each worker generates the output elements that are ultimately added to the final output `PCollection` that the transform produces.
 
-### Core Beam Transforms
+### Core Beam transforms
 
 Beam provides the following transforms, each of which represents a different processing paradigm:
 
@@ -390,7 +390,7 @@ In your processing method, you'll also need to meet some immutability requiremen
 * Once you output a value using `ProcessContext.output()` or `ProcessContext.sideOutput()`, you should not modify that value in any way.
 
 
-##### Lightweight DoFns and Other Abstractions
+##### Lightweight DoFns and other abstractions
 
 If your function is relatively straightforward, you can simplify your use of `ParDo` by providing a lightweight `DoFn` in-line, as <span class="language-java">an anonymous inner class instance</span><span class="language-py">a lambda function</span>.
 
@@ -403,9 +403,8 @@ PCollection<String> words = ...;
 // Apply a ParDo with an anonymous DoFn to the PCollection words.
 // Save the result as the PCollection wordLengths.
 PCollection<Integer> wordLengths = words.apply(
-  ParDo
-    .named("ComputeWordLengths")            // the transform name
-    .of(new DoFn<String, Integer>() {       // a DoFn as an anonymous inner class instance
+  "ComputeWordLengths",                     // the transform name
+  ParDo.of(new DoFn<String, Integer>() {    // a DoFn as an anonymous inner class instance
       @ProcessElement
       public void processElement(ProcessContext c) {
         c.output(c.element().length());
@@ -497,7 +496,7 @@ When you apply a `Combine` transform, you must provide the function that contain
 
 Simple combine operations, such as sums, can usually be implemented as a simple function. More complex combination operations might require you to create a subclass of `CombineFn` that has an accumulation type distinct from the input/output type.
 
-##### **Simple Combinations Using Simple Functions**
+##### **Simple combinations using simple functions**
 
 The following example code shows a simple combine function.
 
@@ -519,7 +518,7 @@ public static class SumInts implements SerializableFunction<Iterable<Integer>, I
 {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_bounded_sum
 %}```
 
-##### **Advanced Combinations using CombineFn**
+##### **Advanced combinations using CombineFn**
 
 For more complex combine functions, you can define a subclass of `CombineFn`. You should use `CombineFn` if the combine function requires a more sophisticated accumulator, must perform additional pre- or post-processing, might change the output type, or takes the key into account.
 
@@ -576,7 +575,7 @@ pc = ...
 
 If you are combining a `PCollection` of key-value pairs, [per-key combining](#transforms-combine-per-key) is often enough. If you need the combining strategy to change based on the key (for example, MIN for some users and MAX for other users), you can define a `KeyedCombineFn` to access the key within the combining strategy.
 
-##### **Combining a PCollection into a Single Value**
+##### **Combining a PCollection into a single value**
 
 Use the global combine to transform all of the elements in a given `PCollection` into a single value, represented in your pipeline as a new `PCollection` containing one element. The following example code shows how to apply the Beam provided sum combine function to produce a single sum value for a `PCollection` of integers.
 
@@ -595,7 +594,7 @@ pc = ...
 result = pc | beam.CombineGlobally(sum)
 ```
 
-##### Global Windowing:
+##### Global windowing:
 
 If your input `PCollection` uses the default global windowing, the default behavior is to return a `PCollection` containing one item. That item's value comes from the accumulator in the combine function that you specified when applying `Combine`. For example, the Beam provided sum combine function returns a zero value (the sum of an empty input), while the min combine function returns a maximal or infinite value.
 
@@ -613,7 +612,7 @@ sum = pc | beam.CombineGlobally(sum).without_defaults()
 
 ```
 
-##### Non-Global Windowing:
+##### Non-global windowing:
 
 If your `PCollection` uses any non-global windowing function, Beam does not provide the default behavior. You must specify one of the following options when applying `Combine`:
 
@@ -621,7 +620,7 @@ If your `PCollection` uses any non-global windowing function, Beam does not prov
 * Specify `.asSingletonView`, in which the output is immediately converted to a `PCollectionView`, which will provide a default value for each empty window when used as a side input. You'll generally only need to use this option if the result of your pipeline's `Combine` is to be used as a side input later in the pipeline.
 
 
-##### <a name="transforms-combine-per-key"></a>**Combining Values in a Key-Grouped Collection**
+##### <a name="transforms-combine-per-key"></a>**Combining values in a key-grouped collection**
 
 After creating a key-grouped collection (for example, by using a `GroupByKey` transform) a common pattern is to combine the collection of values associated with each key into a single, merged value. Drawing on the previous example from `GroupByKey`, a key-grouped `PCollection` called `groupedWords` looks like this:
 
@@ -688,11 +687,11 @@ merged = (
     | beam.Flatten())
 ```
 
-##### Data Encoding in Merged Collections:
+##### Data encoding in merged collections:
 
 By default, the coder for the output `PCollection` is the same as the coder for the first `PCollection` in the input `PCollectionList`. However, the input `PCollection` objects can each use different coders, as long as they all contain the same data type in your chosen language.
 
-##### Merging Windowed Collections:
+##### Merging windowed collections:
 
 When using `Flatten` to merge `PCollection` objects that have a windowing strategy applied, all of the `PCollection` objects you want to merge must use a compatible windowing strategy and window sizing. For example, all the collections you're merging must all use (hypothetically) identical 5-minute fixed windows or 4-minute sliding windows starting every 30 seconds.
 
@@ -733,7 +732,7 @@ by_decile = students | beam.Partition(partition_fn, 10)
 fortieth_percentile = by_decile[4]
 ```
 
-#### <a name="transforms-usercodereqs"></a>General Requirements for Writing User Code for Beam Transforms
+#### <a name="transforms-usercodereqs"></a>General Requirements for writing user code for Beam transforms
 
 When you build user code for a Beam transform, you should keep in mind the distributed nature of execution. For example, there might be many copies of your function running on a lot of different machines in parallel, and those copies function independently, without communicating or sharing state with any of the other copies. Depending on the Pipeline Runner and processing back-end you choose for your pipeline, each copy of your user code function may be retried or run multiple times. As such, you should be cautious about including things like state dependency in your user code.
 
@@ -758,7 +757,7 @@ Some other serializability factors you should keep in mind are:
 * Mutating a function object after it gets applied will have no effect.
 * Take care when declaring your function object inline by using an anonymous inner class instance. In a non-static context, your inner class instance will implicitly contain a pointer to the enclosing class and that class' state. That enclosing class will also be serialized, and thus the same considerations that apply to the function object itself also apply to this outer class.
 
-##### Thread-Compatibility
+##### Thread-compatibility
 
 Your function object should be thread-compatible. Each instance of your function object is accessed by a single thread on a worker instance, unless you explicitly create your own threads. Note, however, that **the Beam SDKs are not thread-safe**. If you create your own threads in your user code, you must provide your own synchronization. Note that static members in your function object are not passed to worker instances and that multiple instances of your function may be accessed from different threads.
 
@@ -768,14 +767,14 @@ It's recommended that you make your function object idempotent--that is, that it
 
 #### <a name="transforms-sideio"></a>Side Inputs and Side Outputs
 
-##### **Side Inputs**
+##### **Side inputs**
 
 In addition to the main input `PCollection`, you can provide additional inputs to a `ParDo` transform in the form of side inputs. A side input is an additional input that your `DoFn` can access each time it processes an element in the input `PCollection`. When you specify a side input, you create a view of some other data that can be read from within the `ParDo` transform's `DoFn` while procesing each element.
 
 Side inputs are useful if your `ParDo` needs to inject additional data when processing each element in the input `PCollection`, but the additional data needs to be determined at runtime (and not hard-coded). Such values might be determined by the input data, or depend on a different branch of your pipeline.
 
 
-##### Passing Side Inputs to ParDo:
+##### Passing side inputs to ParDo:
 
 ```java
   // Pass side inputs to your ParDo transform by invoking .withSideInputs.
@@ -824,7 +823,7 @@ Side inputs are useful if your `ParDo` needs to inject additional data when proc
 
 ```
 
-##### Side Inputs and Windowing:
+##### Side inputs and windowing:
 
 A windowed `PCollection` may be infinite and thus cannot be compressed into a single value (or single collection class). When you create a `PCollectionView` of a windowed `PCollection`, the `PCollectionView` represents a single entity per window (one singleton per window, one list per window, etc.).
 
@@ -836,11 +835,11 @@ If the main input element exists in more than one window, then `processElement`
 
 If the side input has multiple trigger firings, Beam uses the value from the latest trigger firing. This is particularly useful if you use a side input with a single global window and specify a trigger.
 
-##### **Side Outputs**
+##### **Side outputs**
 
 While `ParDo` always produces a main output `PCollection` (as the return value from apply), you can also have your `ParDo` produce any number of additional output `PCollection`s. If you choose to have multiple outputs, your `ParDo` returns all of the output `PCollection`s (including the main output) bundled together.
 
-##### Tags for Side Outputs:
+##### Tags for side outputs:
 
 ```java
 // To emit elements to a side output PCollection, create a TupleTag object to identify each collection that your ParDo produces.
@@ -902,7 +901,7 @@ While `ParDo` always produces a main output `PCollection` (as the return value f
 {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:model_pardo_with_side_outputs_iter
 %}```
 
-##### Emitting to Side Outputs in your DoFn:
+##### Emitting to side outputs in your DoFn:
 
 ```java
 // Inside your ParDo's DoFn, you can emit an element to a side output by using the method ProcessContext.sideOutput.
@@ -948,77 +947,39 @@ When you create a pipeline, you often need to read data from some external sourc
 
 > A guide that covers how to implement your own Beam IO transforms is in progress ([BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025)).
 
-### Reading Input Data
+### Reading input data
 
-Read transforms read data from an external source and return a `PCollection` representation of the data for use by your pipeline. You can use a read transform at any point while constructing your pipeline to create a new `PCollection`, though it will be most common at the start of your pipeline. Here are examples of two common ways to read data.
-
-#### Reading from a `Source`:
-
-```java
-// A fully-specified Read from a GCS file:
-PCollection<Integer> numbers =
-  p.apply("ReadNumbers", TextIO.Read
-   .from("gs://my_bucket/path/to/numbers-*.txt")
-   .withCoder(TextualIntegerCoder.of()));
-```
-
-```python
-pipeline | beam.io.ReadFromText('protocol://path/to/some/inputData.txt')
-```
-
-Note that many sources use the builder pattern for setting options. For additional examples, see the language-specific documentation (such as Javadoc) for each of the sources.
+Read transforms read data from an external source and return a `PCollection` representation of the data for use by your pipeline. You can use a read transform at any point while constructing your pipeline to create a new `PCollection`, though it will be most common at the start of your pipeline.
 
 #### Using a read transform:
 
 ```java
-// This example uses JmsIO.
-PCollection<JmsRecord> output =
-    pipeline.apply(JmsIO.read()
-        .withConnectionFactory(myConnectionFactory)
-        .withQueue("my-queue"))
+PCollection<String> lines = p.apply(TextIO.Read.from("gs://some/inputData.txt"));   
 ```
 
 ```python
-pipeline | beam.io.textio.ReadFromText('my_file_name')
+lines = pipeline | beam.io.ReadFromText('gs://some/inputData.txt')
 ```
 
-### Writing Output Data
+### Writing output data
 
-Write transforms write the data in a `PCollection` to an external data source. You will most often use write transforms at the end of your pipeline to output your pipeline's final results. However, you can use a write transform to output a `PCollection`'s data at any point in your pipeline. Here are examples of two common ways to write data.
-
-#### Writing to a `Sink`:
-
-```java
-// This example uses XmlSink.
-pipeline.apply(Write.to(
-          XmlSink.ofRecordClass(Type.class)
-              .withRootElementName(root_element)
-              .toFilenamePrefix(output_filename)));
-```
-
-```python
-output | beam.io.WriteToText('my_file_name')
-```
+Write transforms write the data in a `PCollection` to an external data source. You will most often use write transforms at the end of your pipeline to output your pipeline's final results. However, you can use a write transform to output a `PCollection`'s data at any point in your pipeline. 
 
-#### Using a write transform:
+#### Using a Write transform:
 
 ```java
-// This example uses JmsIO.
-pipeline.apply(...) // returns PCollection<String>
-        .apply(JmsIO.write()
-            .withConnectionFactory(myConnectionFactory)
-            .withQueue("my-queue")
+output.apply(TextIO.Write.to("gs://some/outputData"));
 ```
 
 ```python
-output | beam.io.textio.WriteToText('my_file_name')
+output | beam.io.WriteToText('gs://some/outputData')
 ```
 
 ### File-based input and output data
 
-#### Reading From Multiple Locations:
+#### Reading from multiple locations:
 
-Many read transforms support reading from multiple input files matching a glob operator you provide. The following TextIO example uses a glob operator (\*) to read all matching input files that have prefix "input-" and the suffix ".csv" in the given location:
+Many read transforms support reading from multiple input files matching a glob operator you provide. Note that glob operators are filesystem-specific and obey filesystem-specific consistency models. The following TextIO example uses a glob operator (\*) to read all matching input files that have prefix "input-" and the suffix ".csv" in the given location:
 
 ```java
 p.apply(\u201cReadFromText\u201d,
@@ -1031,18 +992,18 @@ lines = p | beam.io.Read(
     beam.io.TextFileSource('protocol://my_bucket/path/to/input-*.csv'))
 ```
 
-To read data from disparate sources into a single `PCollection`, read each one independently and then use the `Flatten` transform to create a single `PCollection`.
+To read data from disparate sources into a single `PCollection`, read each one independently and then use the [Flatten](#transforms-flatten-partition) transform to create a single `PCollection`.
 
-#### Writing To Multiple Output Files:
+#### Writing to multiple output files:
 
 For file-based output data, write transforms write to multiple output files by default. When you pass an output file name to a write transform, the file name is used as the prefix for all output files that the write transform produces. You can append a suffix to each output file by specifying a suffix.
 
 The following write transform example writes multiple output files to a location. Each file has the prefix "numbers", a numeric tag, and the suffix ".csv".
 
 ```java
-records.apply(TextIO.Write.named("WriteToText")
-                          .to("protocol://my_bucket/path/to/numbers")
-                          .withSuffix(".csv"));
+records.apply("WriteToText",
+    TextIO.Write.to("protocol://my_bucket/path/to/numbers")
+                .withSuffix(".csv"));
 ```
 
 ```python
@@ -1050,7 +1011,7 @@ filtered_words | beam.io.WriteToText(
 'protocol://my_bucket/path/to/numbers', file_name_suffix='.csv')
 ```
 
-### Beam provided I/O APIs
+### Beam-provided I/O APIs
 
 See the language specific source code directories for the Beam supported I/O APIs. Specific documentation for each of these I/O sources will be added in the future. ([BEAM-1054](https://issues.apache.org/jira/browse/BEAM-1054))
 
@@ -1100,7 +1061,7 @@ See the language specific source code directories for the Beam supported I/O API
 </table>
 
 
-## <a name="running"></a>Running the Pipeline
+## <a name="running"></a>Running the pipeline
 
 To run your pipeline, use the `run` method. The program you create sends a specification for your pipeline to a pipeline runner, which then constructs and runs the actual series of pipeline operations. Pipelines are executed asynchronously by default.
 
@@ -1119,7 +1080,7 @@ pipeline.run().waitUntilFinish();
 ```
 
 ```python
-# Not currently supported.
+pipeline.run().wait_until_finish();
 ```
 
 <a name="transforms-composite"></a>