You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2018/08/20 21:40:37 UTC

[beam-site] branch mergebot updated (0ea9291 -> 67d7fba)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


    from 0ea9291  This closes #458
     add 2bf7d7d  Prepare repository for deployment.
     new 97c2ac5  Add blog post "A review of input streaming connectors"
     new bf7240b  Add authors for blog post in #521
     new 11c9c29  Fix typo for author's name in blog post #521
     new d2cf4a7  Fix other typo in author's name for blog post #521
     new c5037a2  Blog post updates based on @iemejia's feedback
     new 3cd63cd  Updates to streaming connectors blog post
     new cc68b49  Set publication date for streaming connectors blog post
     new 645574c  Update doc links in blog post to point to latest release
     new d23c996  Fix extraneous p tag and add table borders
     new 15c765f  Update streaming connectors blog post's publication date
     new 67d7fba  This closes #521

The 11 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/programming-guide/index.html | 125 ++++++++----
 src/_data/authors.yml                              |   8 +
 ...2018-08-20-review-input-streaming-connectors.md | 225 +++++++++++++++++++++
 3 files changed, 321 insertions(+), 37 deletions(-)
 create mode 100644 src/_posts/2018-08-20-review-input-streaming-connectors.md


[beam-site] 01/11: Add blog post "A review of input streaming connectors"

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 97c2ac51e3eece847dab1323b144386f1d0c89ab
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Thu Aug 2 21:04:18 2018 -0700

    Add blog post "A review of input streaming connectors"
---
 ...2018-08-XX-review-input-streaming-connectors.md | 224 +++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
new file mode 100644
index 0000000..7591ba2
--- /dev/null
+++ b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
@@ -0,0 +1,224 @@
+---
+layout: post
+title:  "A review of input streaming connectors"
+date:   2018-08-XX 00:00:01 -0800
+excerpt_separator: <!--more-->
+categories: blog
+authors:
+  - lkulighin
+  - julienphalip
+---
+
+In this post, you'll learn about the current state of support for input streaming connectors in [Apache Beam](https://beam.apache.org/). For more context, you'll also learn about the corresponding state of support in [Apache Spark](https://spark.apache.org/).<!--more-->
+
+With batch processing, you might load data from any source, including a database system. Even if there are no specific SDKs available for those database systems, you can often resort to using a [JDBC](https://en.wikipedia.org/wiki/Java_Database_Connectivity) driver. With streaming, implementing a proper data pipeline is arguably more challenging as generally fewer source types are available. For that reason, this article particularly focuses on the streaming use case.
+
+## Connectors for Java
+
+Beam has an official [Java SDK](https://beam.apache.org/documentation/sdks/java/) and has several execution engines, called [runners](https://beam.apache.org/documentation/runners/capability-matrix/). In most cases it is fairly easy to transfer existing Beam pipelines written in Java or Scala to a Spark environment by using the [Spark Runner](https://beam.apache.org/documentation/runners/spark/).
+
+Spark is written in Scala and has a [Java API](https://spark.apache.org/docs/latest/api/java/). Spark's source code compiles to [Java bytecode](https://en.wikipedia.org/wiki/Java_(programming_language)#Java_JVM_and_Bytecode) and the binaries are run by a [Java Virtual Machine](https://en.wikipedia.org/wiki/Java_virtual_machine). Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa).
+
+Spark offers two approaches to streaming: [Discretized Streaming](https://spark.apache.org/docs/latest/streaming-programming-guide.html) (or DStreams) and [Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html). DStreams are a basic abstraction that represents a continuous series of [Resilient Distributed Datasets](https://spark.apache.org/docs/latest/rdd-programming-guide.html) (or RDDs). Structured Streaming was introduced more recently  [...]
+
+Spark Structured Streaming supports [file sources](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/DataStreamReader.html) (local filesystems and HDFS-compatible systems like Cloud Storage or S3) and [Kafka](https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html) as streaming inputs. Spark maintains built-in connectors for DStreams aimed at third-party services, such as Kafka or Flume, while other connectors are available through link [...]
+
+Below are the main streaming input connectors for available for Beam and Spark DStreams in Java:
+
+<table>
+  <tr>
+   <td>
+   </td>
+   <td>
+   </td>
+   <td><strong>Apache Beam</strong>
+   </td>
+   <td><strong>Apache Spark DStreams</strong>
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="2" >File Systems
+   </td>
+   <td>Local<br>(Using the <code>file://</code> URI)
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/TextIO.html">TextIO</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/StreamingContext.html#textFileStream-java.lang.String-">textFileStream</a><br>(Spark treats most Unix systems as HDFS-compatible, but the location should be accessible from all nodes)
+   </td>
+  </tr>
+  <tr>
+   <td>HDFS<br>(Using the <code>hdfs://</code> URI)
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/util/HdfsUtils.html">HdfsUtils</a>
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="2" >Object Stores
+   </td>
+   <td>Cloud Storage<br>(Using the <code>gs://</code> URI)
+   </td>
+   <td rowspan="2" ><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+   </td>
+   <td rowspan="2" ><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a>
+<p>
+and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/StreamingContext.html#textFileStream-java.lang.String-">textFileStream</a>
+   </td>
+  </tr>
+  <tr>
+   <td>S3<br>(Using the <code>s3://</code> URI)
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="3" >Messaging Queues
+   </td>
+   <td>Kafka
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/kafka/KafkaIO.html">KafkaIO</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html">spark-streaming-kafka</a>
+   </td>
+  </tr>
+  <tr>
+   <td>Kinesis
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/kinesis/KinesisIO.html">KinesisIO</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/streaming-kinesis-integration.html">spark-streaming-kinesis</a>
+   </td>
+  </tr>
+  <tr>
+   <td>Cloud Pub/Sub
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
+   </td>
+   <td><a href="https://github.com/apache/bahir/tree/master/streaming-pubsub">Spark-streaming-pubsub</a> from <a href="http://bahir.apache.org">Apache Bahir</a>
+   </td>
+  </tr>
+  <tr>
+   <td>Other
+   </td>
+   <td>Custom receivers
+   </td>
+   <td><a href="https://beam.apache.org/documentation/io/authoring-overview/#read-transforms">Read Transforms</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/streaming-custom-receivers.html">receiverStream</a>
+   </td>
+  </tr>
+</table>
+
+## Connectors for Python
+
+Beam has an official [Python SDK](https://beam.apache.org/documentation/sdks/python/) that currently supports a subset of the streaming features available in the Java SDK. Active development is underway to bridge the gap between the featuresets in the two SDKs. Currently for Python, the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) and [Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/) are supported, and [several streaming options](https:/ [...]
+
+Spark also has a Python SDK called [PySpark](http://spark.apache.org/docs/latest/api/python/pyspark.html). As mentioned earlier, Scala code compiles to a bytecode that is executed by the JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python programs to interact with the JVM and therefore access Java libraries, interact with Java objects, and register callbacks from Java. This allows PySpark to access native Spark objects like RDDs. Spark Structured Streaming supp [...]
+
+Below are the main streaming input connectors for available for Beam and Spark DStreams in Python:
+
+<table>
+  <tr>
+   <td>
+   </td>
+   <td>
+   </td>
+   <td><strong>Apache Beam</strong>
+   </td>
+   <td><strong>Apache Spark DStreams</strong>
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="2" >File Systems
+   </td>
+   <td>Local
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.textio.html">io.textio</a>
+   </td>
+   <td><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
+   </td>
+  </tr>
+  <tr>
+   <td>HDFS
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
+   </td>
+   <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a> (Access through <code>sc._jsc</code> with Py4J)
+and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="2" >Object stores
+   </td>
+   <td>Google Cloud Storage
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.gcp.gcsio.html">io.gcp.gcsio</a>
+   </td>
+   <td rowspan="2" ><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
+   </td>
+  </tr>
+  <tr>
+   <td>S3
+   </td>
+   <td>N/A
+   </td>
+  </tr>
+  <tr>
+   <td rowspan="3" >Messaging Queues
+   </td>
+   <td>Kafka
+   </td>
+   <td>N/A
+   </td>
+   <td><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils">KafkaUtils</a>
+   </td>
+  </tr>
+  <tr>
+   <td>Kinesis
+   </td>
+   <td>N/A
+   </td>
+   <td><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#module-pyspark.streaming.kinesis">KinesisUtils</a>
+   </td>
+  </tr>
+  <tr>
+   <td>Cloud Pub/Sub
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.gcp.pubsub.html">io.gcp.pubsub</a>
+   </td>
+   <td>N/A
+   </td>
+  </tr>
+  <tr>
+   <td>Other
+   </td>
+   <td>Custom receivers
+   </td>
+   <td><a href="https://beam.apache.org/documentation/sdks/python-custom-io/">BoundedSource and RangeTracker</a>
+   </td>
+   <td>N/A
+   </td>
+  </tr>
+</table>
+
+## Connectors for other languages
+
+### **Scala**
+
+Since Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa), you can use the same Java connectors described above in your Scala programs. Apache Beam also has a [Scala SDK](https://github.com/spotify/scio) open-sourced [by Spotify](https://labs.spotify.com/2017/10/16/big-data-processing-at-spotify-the-road-to-scio-part-1/).
+
+### **Go**
+
+A [Go SDK](https://beam.apache.org/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production.
+
+### **R**
+
+Apache Beam does not have an official R SDK. Spark Structured Streaming is supported by an [R SDK](https://spark.apache.org/docs/latest/sparkr.html#structured-streaming), but only for [file sources](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources) as a streaming input.
+
+## Next steps
+
+We hope this article inspired you to try new and interesting ways of connecting streaming sources to your Beam pipelines!
+
+Check out the following links for further information:
+
+*   See a full list of all built-in and in-progress [I/O Transforms](https://beam.apache.org/documentation/io/built-in/) for Apache Beam.
+*   Learn about some Apache Beam mobile gaming pipeline [examples](https://beam.apache.org/get-started/mobile-gaming-example/).


[beam-site] 03/11: Fix typo for author's name in blog post #521

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 11c9c29ed30b331674afc41f2289ca8619c0ca8e
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Mon Aug 6 00:38:10 2018 -0700

    Fix typo for author's name in blog post #521
---
 src/_posts/2018-08-XX-review-input-streaming-connectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
index 7591ba2..c324d80 100644
--- a/src/_posts/2018-08-XX-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
@@ -5,7 +5,7 @@ date:   2018-08-XX 00:00:01 -0800
 excerpt_separator: <!--more-->
 categories: blog
 authors:
-  - lkulighin
+  - lkuligin
   - julienphalip
 ---
 


[beam-site] 05/11: Blog post updates based on @iemejia's feedback

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit c5037a277bc347971635bc04d5d05e65e2acbd68
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Mon Aug 13 13:17:54 2018 -0400

    Blog post updates based on @iemejia's feedback
---
 src/_posts/2018-08-XX-review-input-streaming-connectors.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
index aa19675..fded813 100644
--- a/src/_posts/2018-08-XX-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
@@ -21,7 +21,7 @@ Spark is written in Scala and has a [Java API](https://spark.apache.org/docs/lat
 
 Spark offers two approaches to streaming: [Discretized Streaming](https://spark.apache.org/docs/latest/streaming-programming-guide.html) (or DStreams) and [Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html). DStreams are a basic abstraction that represents a continuous series of [Resilient Distributed Datasets](https://spark.apache.org/docs/latest/rdd-programming-guide.html) (or RDDs). Structured Streaming was introduced more recently  [...]
 
-Spark Structured Streaming supports [file sources](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/DataStreamReader.html) (local filesystems and HDFS-compatible systems like Cloud Storage or S3) and [Kafka](https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html) as streaming inputs. Spark maintains built-in connectors for DStreams aimed at third-party services, such as Kafka or Flume, while other connectors are available through link [...]
+Spark Structured Streaming supports [file sources](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/DataStreamReader.html) (local filesystems and HDFS-compatible systems like Cloud Storage or S3) and [Kafka](https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html) as streaming [inputs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources). Spark maintains built-in connectors for DStreams aimed  [...]
 
 Below are the main streaming input connectors for available for Beam and Spark DStreams in Java:
 
@@ -49,7 +49,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
   <tr>
    <td>HDFS<br>(Using the <code>hdfs://</code> URI)
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/util/HdfsUtils.html">HdfsUtils</a>
    </td>
@@ -93,7 +93,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
    </td>
    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
    </td>
-   <td><a href="https://github.com/apache/bahir/tree/master/streaming-pubsub">Spark-streaming-pubsub</a> from <a href="http://bahir.apache.org">Apache Bahir</a>
+   <td><a href="https://github.com/apache/bahir/tree/master/streaming-pubsub">spark-streaming-pubsub</a> from <a href="http://bahir.apache.org">Apache Bahir</a>
    </td>
   </tr>
   <tr>
@@ -204,11 +204,11 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
 
 ### **Scala**
 
-Since Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa), you can use the same Java connectors described above in your Scala programs. Apache Beam also has a [Scala SDK](https://github.com/spotify/scio) open-sourced [by Spotify](https://labs.spotify.com/2017/10/16/big-data-processing-at-spotify-the-road-to-scio-part-1/).
+Since Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa), you can use the same Java connectors described above in your Scala programs. Apache Beam also has a [Scala API](https://github.com/spotify/scio) open-sourced [by Spotify](https://labs.spotify.com/2017/10/16/big-data-processing-at-spotify-the-road-to-scio-part-1/).
 
 ### **Go**
 
-A [Go SDK](https://beam.apache.org/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production.
+A [Go SDK](https://beam.apache.org/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production. Spark does not have an official Go SDK.
 
 ### **R**
 


[beam-site] 07/11: Set publication date for streaming connectors blog post

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit cc68b49a8dbba424274012421e2fe9ee9db09a00
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Tue Aug 14 11:23:55 2018 -0400

    Set publication date for streaming connectors blog post
---
 ...ng-connectors.md => 2018-08-16-review-input-streaming-connectors.md} | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-16-review-input-streaming-connectors.md
similarity index 99%
rename from src/_posts/2018-08-XX-review-input-streaming-connectors.md
rename to src/_posts/2018-08-16-review-input-streaming-connectors.md
index 5816292..2b69a41 100644
--- a/src/_posts/2018-08-XX-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-16-review-input-streaming-connectors.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title:  "A review of input streaming connectors"
-date:   2018-08-XX 00:00:01 -0800
+date:   2018-08-16 00:00:01 -0800
 excerpt_separator: <!--more-->
 categories: blog
 authors:


[beam-site] 10/11: Update streaming connectors blog post's publication date

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 15c765f1bddebce303588f46a7f9fa59cf9b7e03
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Mon Aug 20 14:18:37 2018 -0700

    Update streaming connectors blog post's publication date
---
 ...ng-connectors.md => 2018-08-20-review-input-streaming-connectors.md} | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/_posts/2018-08-16-review-input-streaming-connectors.md b/src/_posts/2018-08-20-review-input-streaming-connectors.md
similarity index 99%
rename from src/_posts/2018-08-16-review-input-streaming-connectors.md
rename to src/_posts/2018-08-20-review-input-streaming-connectors.md
index 1edbc9a..4d6f104 100644
--- a/src/_posts/2018-08-16-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-20-review-input-streaming-connectors.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title:  "A review of input streaming connectors"
-date:   2018-08-16 00:00:01 -0800
+date:   2018-08-20 00:00:01 -0800
 excerpt_separator: <!--more-->
 categories: blog
 authors:


[beam-site] 02/11: Add authors for blog post in #521

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit bf7240be8cc46dbc5529ae341d851e26163976da
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Mon Aug 6 00:37:17 2018 -0700

    Add authors for blog post in #521
---
 src/_data/authors.yml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/_data/authors.yml b/src/_data/authors.yml
index aa22d5d..4ba47be 100644
--- a/src/_data/authors.yml
+++ b/src/_data/authors.yml
@@ -41,10 +41,18 @@ jamesmalone:
 jesseanderson:
     name: Jesse Anderson
     twitter: jessetanderson
+jphalip:
+    name: Julien Phalip
+    email: jphalip@google.com
+    twitter: julienphalip
 klk:
     name: Kenneth Knowles
     email: klk@apache.org
     twitter: KennKnowles
+lkuligin:
+    name: Leonid Kuligin
+    email: kuligin@google.com
+    twitter: lkulighin
 robertwb:
     name: Robert Bradshaw
     email: robertwb@apache.org


[beam-site] 08/11: Update doc links in blog post to point to latest release

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 645574c9dcb59afe503702381781d6afdfc2b673
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Wed Aug 15 10:17:50 2018 -0700

    Update doc links in blog post to point to latest release
---
 ...2018-08-16-review-input-streaming-connectors.md | 38 +++++++++++-----------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/_posts/2018-08-16-review-input-streaming-connectors.md b/src/_posts/2018-08-16-review-input-streaming-connectors.md
index 2b69a41..72983b8 100644
--- a/src/_posts/2018-08-16-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-16-review-input-streaming-connectors.md
@@ -9,13 +9,13 @@ authors:
   - jphalip
 ---
 
-In this post, you'll learn about the current state of support for input streaming connectors in [Apache Beam](https://beam.apache.org/). For more context, you'll also learn about the corresponding state of support in [Apache Spark](https://spark.apache.org/).<!--more-->
+In this post, you'll learn about the current state of support for input streaming connectors in [Apache Beam]({{ site.baseurl }}/). For more context, you'll also learn about the corresponding state of support in [Apache Spark](https://spark.apache.org/).<!--more-->
 
 With batch processing, you might load data from any source, including a database system. Even if there are no specific SDKs available for those database systems, you can often resort to using a [JDBC](https://en.wikipedia.org/wiki/Java_Database_Connectivity) driver. With streaming, implementing a proper data pipeline is arguably more challenging as generally fewer source types are available. For that reason, this article particularly focuses on the streaming use case.
 
 ## Connectors for Java
 
-Beam has an official [Java SDK](https://beam.apache.org/documentation/sdks/java/) and has several execution engines, called [runners](https://beam.apache.org/documentation/runners/capability-matrix/). In most cases it is fairly easy to transfer existing Beam pipelines written in Java or Scala to a Spark environment by using the [Spark Runner](https://beam.apache.org/documentation/runners/spark/).
+Beam has an official [Java SDK]({{ site.baseurl }}/documentation/sdks/java/) and has several execution engines, called [runners]({{ site.baseurl }}/documentation/runners/capability-matrix/). In most cases it is fairly easy to transfer existing Beam pipelines written in Java or Scala to a Spark environment by using the [Spark Runner]({{ site.baseurl }}/documentation/runners/spark/).
 
 Spark is written in Scala and has a [Java API](https://spark.apache.org/docs/latest/api/java/). Spark's source code compiles to [Java bytecode](https://en.wikipedia.org/wiki/Java_(programming_language)#Java_JVM_and_Bytecode) and the binaries are run by a [Java Virtual Machine](https://en.wikipedia.org/wiki/Java_virtual_machine). Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa).
 
@@ -41,7 +41,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Local<br>(Using the <code>file://</code> URI)
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/TextIO.html">TextIO</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/TextIO.html">TextIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/StreamingContext.html#textFileStream-java.lang.String-">textFileStream</a><br>(Spark treats most Unix systems as HDFS-compatible, but the location should be accessible from all nodes)
    </td>
@@ -49,7 +49,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
   <tr>
    <td>HDFS<br>(Using the <code>hdfs://</code> URI)
    </td>
-    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+    <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/util/HdfsUtils.html">HdfsUtils</a>
    </td>
@@ -59,7 +59,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Cloud Storage<br>(Using the <code>gs://</code> URI)
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html">GcsOptions</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html">GcsOptions</a>
    </td>
    <td rowspan="2" ><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a>
 <p>
@@ -69,7 +69,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>S3<br>(Using the <code>s3://</code> URI)
    </td>
-    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/aws/options/S3Options.html">S3Options</a>
+    <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/aws/options/S3Options.html">S3Options</a>
    </td>
   </tr>
   <tr>
@@ -77,7 +77,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
    </td>
    <td>Kafka
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/kafka/KafkaIO.html">KafkaIO</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/kafka/KafkaIO.html">KafkaIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html">spark-streaming-kafka</a>
    </td>
@@ -85,7 +85,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>Kinesis
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/kinesis/KinesisIO.html">KinesisIO</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/kinesis/KinesisIO.html">KinesisIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/streaming-kinesis-integration.html">spark-streaming-kinesis</a>
    </td>
@@ -93,7 +93,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>Cloud Pub/Sub
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
    </td>
    <td><a href="https://github.com/apache/bahir/tree/master/streaming-pubsub">spark-streaming-pubsub</a> from <a href="http://bahir.apache.org">Apache Bahir</a>
    </td>
@@ -103,7 +103,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
    </td>
    <td>Custom receivers
    </td>
-   <td><a href="https://beam.apache.org/documentation/io/authoring-overview/#read-transforms">Read Transforms</a>
+   <td><a href="{{ site.baseurl }}/documentation/io/authoring-overview/#read-transforms">Read Transforms</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/streaming-custom-receivers.html">receiverStream</a>
    </td>
@@ -112,7 +112,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
 
 ## Connectors for Python
 
-Beam has an official [Python SDK](https://beam.apache.org/documentation/sdks/python/) that currently supports a subset of the streaming features available in the Java SDK. Active development is underway to bridge the gap between the featuresets in the two SDKs. Currently for Python, the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) and [Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/) are supported, and [several streaming options](https:/ [...]
+Beam has an official [Python SDK]({{ site.baseurl }}/documentation/sdks/python/) that currently supports a subset of the streaming features available in the Java SDK. Active development is underway to bridge the gap between the featuresets in the two SDKs. Currently for Python, the [Direct Runner]({{ site.baseurl }}/documentation/runners/direct/) and [Dataflow Runner]({{ site.baseurl }}/documentation/runners/dataflow/) are supported, and [several streaming options]({{ site.baseurl }}/doc [...]
 
 Spark also has a Python SDK called [PySpark](http://spark.apache.org/docs/latest/api/python/pyspark.html). As mentioned earlier, Scala code compiles to a bytecode that is executed by the JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python programs to interact with the JVM and therefore access Java libraries, interact with Java objects, and register callbacks from Java. This allows PySpark to access native Spark objects like RDDs. Spark Structured Streaming supp [...]
 
@@ -134,7 +134,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Local
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.textio.html">io.textio</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.io.textio.html">io.textio</a>
    </td>
    <td><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
    </td>
@@ -142,7 +142,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
   <tr>
    <td>HDFS
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a> (Access through <code>sc._jsc</code> with Py4J)
 and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
@@ -153,7 +153,7 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
    </td>
    <td>Google Cloud Storage
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.gcp.gcsio.html">io.gcp.gcsio</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.io.gcp.gcsio.html">io.gcp.gcsio</a>
    </td>
    <td rowspan="2" ><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
    </td>
@@ -185,7 +185,7 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
   <tr>
    <td>Cloud Pub/Sub
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.gcp.pubsub.html">io.gcp.pubsub</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/pydoc/{{ site.release_latest }}/apache_beam.io.gcp.pubsub.html">io.gcp.pubsub</a>
    </td>
    <td>N/A
    </td>
@@ -195,7 +195,7 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
    </td>
    <td>Custom receivers
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/python-custom-io/">BoundedSource and RangeTracker</a>
+   <td><a href="{{ site.baseurl }}/documentation/sdks/python-custom-io/">BoundedSource and RangeTracker</a>
    </td>
    <td>N/A
    </td>
@@ -210,7 +210,7 @@ Since Scala code is interoperable with Java and therefore has native compatibili
 
 ### **Go**
 
-A [Go SDK](https://beam.apache.org/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production. Spark does not have an official Go SDK.
+A [Go SDK]({{ site.baseurl }}/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production. Spark does not have an official Go SDK.
 
 ### **R**
 
@@ -222,5 +222,5 @@ We hope this article inspired you to try new and interesting ways of connecting
 
 Check out the following links for further information:
 
-*   See a full list of all built-in and in-progress [I/O Transforms](https://beam.apache.org/documentation/io/built-in/) for Apache Beam.
-*   Learn about some Apache Beam mobile gaming pipeline [examples](https://beam.apache.org/get-started/mobile-gaming-example/).
+*   See a full list of all built-in and in-progress [I/O Transforms]({{ site.baseurl }}/documentation/io/built-in/) for Apache Beam.
+*   Learn about some Apache Beam mobile gaming pipeline [examples]({{ site.baseurl }}/get-started/mobile-gaming-example/).


[beam-site] 11/11: This closes #521

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 67d7fba416419e527bd8bb8ff5e4d744a25828b9
Merge: 2bf7d7d 15c765f
Author: Mergebot <me...@apache.org>
AuthorDate: Mon Aug 20 21:39:57 2018 +0000

    This closes #521

 src/_data/authors.yml                              |   8 +
 ...2018-08-20-review-input-streaming-connectors.md | 225 +++++++++++++++++++++
 2 files changed, 233 insertions(+)


[beam-site] 06/11: Updates to streaming connectors blog post

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 3cd63cdd4666bf38dfbd5448dd40155b4d6f6015
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Tue Aug 14 11:23:28 2018 -0400

    Updates to streaming connectors blog post
---
 ...2018-08-XX-review-input-streaming-connectors.md | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
index fded813..5816292 100644
--- a/src/_posts/2018-08-XX-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
@@ -41,7 +41,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Local<br>(Using the <code>file://</code> URI)
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/TextIO.html">TextIO</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/TextIO.html">TextIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/StreamingContext.html#textFileStream-java.lang.String-">textFileStream</a><br>(Spark treats most Unix systems as HDFS-compatible, but the location should be accessible from all nodes)
    </td>
@@ -49,7 +49,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
   <tr>
    <td>HDFS<br>(Using the <code>hdfs://</code> URI)
    </td>
-    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/util/HdfsUtils.html">HdfsUtils</a>
    </td>
@@ -59,7 +59,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Cloud Storage<br>(Using the <code>gs://</code> URI)
    </td>
-   <td rowspan="2" ><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.html">HadoopFileSystemOptions</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html">GcsOptions</a>
    </td>
    <td rowspan="2" ><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a>
 <p>
@@ -69,13 +69,15 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>S3<br>(Using the <code>s3://</code> URI)
    </td>
+    <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/aws/options/S3Options.html">S3Options</a>
+   </td>
   </tr>
   <tr>
    <td rowspan="3" >Messaging Queues
    </td>
    <td>Kafka
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/kafka/KafkaIO.html">KafkaIO</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/kafka/KafkaIO.html">KafkaIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html">spark-streaming-kafka</a>
    </td>
@@ -83,7 +85,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>Kinesis
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/kinesis/KinesisIO.html">KinesisIO</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/kinesis/KinesisIO.html">KinesisIO</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/streaming-kinesis-integration.html">spark-streaming-kinesis</a>
    </td>
@@ -91,7 +93,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
   <tr>
    <td>Cloud Pub/Sub
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html">PubsubIO</a>
    </td>
    <td><a href="https://github.com/apache/bahir/tree/master/streaming-pubsub">spark-streaming-pubsub</a> from <a href="http://bahir.apache.org">Apache Bahir</a>
    </td>
@@ -132,7 +134,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
    </td>
    <td>Local
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.textio.html">io.textio</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.textio.html">io.textio</a>
    </td>
    <td><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
    </td>
@@ -140,7 +142,7 @@ Below are the main streaming input connectors for available for Beam and Spark D
   <tr>
    <td>HDFS
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
    </td>
    <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a> (Access through <code>sc._jsc</code> with Py4J)
 and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
@@ -151,7 +153,7 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
    </td>
    <td>Google Cloud Storage
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.gcp.gcsio.html">io.gcp.gcsio</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.gcp.gcsio.html">io.gcp.gcsio</a>
    </td>
    <td rowspan="2" ><a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
    </td>
@@ -183,7 +185,7 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
   <tr>
    <td>Cloud Pub/Sub
    </td>
-   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.5.0/apache_beam.io.gcp.pubsub.html">io.gcp.pubsub</a>
+   <td><a href="https://beam.apache.org/documentation/sdks/pydoc/2.6.0/apache_beam.io.gcp.pubsub.html">io.gcp.pubsub</a>
    </td>
    <td>N/A
    </td>


[beam-site] 09/11: Fix extraneous p tag and add table borders

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit d23c9960cd415e77724b3a6878e3aafae7d1370a
Author: Melissa Pashniak <me...@google.com>
AuthorDate: Mon Aug 20 14:09:06 2018 -0700

    Fix extraneous p tag and add table borders
---
 src/_posts/2018-08-16-review-input-streaming-connectors.md | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/_posts/2018-08-16-review-input-streaming-connectors.md b/src/_posts/2018-08-16-review-input-streaming-connectors.md
index 72983b8..1edbc9a 100644
--- a/src/_posts/2018-08-16-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-16-review-input-streaming-connectors.md
@@ -25,7 +25,7 @@ Spark Structured Streaming supports [file sources](https://spark.apache.org/docs
 
 Below are the main streaming input connectors for available for Beam and Spark DStreams in Java:
 
-<table>
+<table class="table table-bordered">
   <tr>
    <td>
    </td>
@@ -62,7 +62,6 @@ Below are the main streaming input connectors for available for Beam and Spark D
    <td><a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/io/FileIO.html">FileIO</a> + <a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html">GcsOptions</a>
    </td>
    <td rowspan="2" ><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a>
-<p>
 and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/StreamingContext.html#textFileStream-java.lang.String-">textFileStream</a>
    </td>
   </tr>
@@ -118,7 +117,7 @@ Spark also has a Python SDK called [PySpark](http://spark.apache.org/docs/latest
 
 Below are the main streaming input connectors for available for Beam and Spark DStreams in Python:
 
-<table>
+<table class="table table-bordered">
   <tr>
    <td>
    </td>
@@ -204,15 +203,15 @@ and <a href="http://spark.apache.org/docs/latest/api/python/pyspark.streaming.ht
 
 ## Connectors for other languages
 
-### **Scala**
+### Scala
 
 Since Scala code is interoperable with Java and therefore has native compatibility with Java libraries (and vice versa), you can use the same Java connectors described above in your Scala programs. Apache Beam also has a [Scala API](https://github.com/spotify/scio) open-sourced [by Spotify](https://labs.spotify.com/2017/10/16/big-data-processing-at-spotify-the-road-to-scio-part-1/).
 
-### **Go**
+### Go
 
 A [Go SDK]({{ site.baseurl }}/documentation/sdks/go/) for Apache Beam is under active development. It is currently experimental and is not recommended for production. Spark does not have an official Go SDK.
 
-### **R**
+### R
 
 Apache Beam does not have an official R SDK. Spark Structured Streaming is supported by an [R SDK](https://spark.apache.org/docs/latest/sparkr.html#structured-streaming), but only for [file sources](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources) as a streaming input.
 


[beam-site] 04/11: Fix other typo in author's name for blog post #521

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit d2cf4a797653a1e259de2d7c0a7ea00aec4fecc1
Author: Julien Phalip <jp...@gmail.com>
AuthorDate: Mon Aug 6 00:39:07 2018 -0700

    Fix other typo in author's name for blog post #521
---
 src/_posts/2018-08-XX-review-input-streaming-connectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/_posts/2018-08-XX-review-input-streaming-connectors.md b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
index c324d80..aa19675 100644
--- a/src/_posts/2018-08-XX-review-input-streaming-connectors.md
+++ b/src/_posts/2018-08-XX-review-input-streaming-connectors.md
@@ -6,7 +6,7 @@ excerpt_separator: <!--more-->
 categories: blog
 authors:
   - lkuligin
-  - julienphalip
+  - jphalip
 ---
 
 In this post, you'll learn about the current state of support for input streaming connectors in [Apache Beam](https://beam.apache.org/). For more context, you'll also learn about the corresponding state of support in [Apache Spark](https://spark.apache.org/).<!--more-->