You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/02 01:42:59 UTC

[GitHub] [beam] boyuanzz opened a new pull request #13456: Add a small announcement for Splittable DoFn.

boyuanzz opened a new pull request #13456:
URL: https://github.com/apache/beam/pull/13456


   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam
 .apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.a
 pache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lostluck commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
lostluck commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r537932640



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,85 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.
+* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting.
+  - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance
+  benefits from splitting strategies, which limits many real-world usages. This is no longer a limit
+  for a Splittable DoFn.
+
+As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended
+way to build the new I/O connectors.Try out building your own Splittable DoFn by following the
+[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We
+have provided tones of common utility classes such as common types of `RestrictionTracker` and
+`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O
+connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable
+DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual
+Splittable DoFn implementation to gain more performance benefits.
+
+Many thanks to every contributor who brought this highly expected design into the data processing
+world. We are really excited to see that users benefit from Splittable DoFn.
+
+At the end, hope you enjoy exploring some real-world Splittable DoFn examples.
+
+## Real world Splittable DoFn examples
+
+**Java Examples**
+
+* [Kafka](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java#L118):
+An I/O connector for [Apache Kafka](https://kafka.apache.org/)
+(an open-source distributed event streaming platform).
+* [Watch](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L787):
+Uses a polling function producing a growing set of outputs for each input until a per-input
+termination condition is met.
+* [Parquet](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L365):
+An I/O connector for [Apache Parquet](https://parquet.apache.org/)
+(an open-source columnar storage format).
+* [HL7v2](https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java#L493):
+An I/O connector for HL7v2 messages (a clinical messaging format that provides data about events
+that occur inside an organization) part of
+[Google’s Cloud Healthcare API](https://cloud.google.com/healthcare).
+* [BoundedSource wrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L248):
+A wrapper which converts an existing [BoundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/BoundedSource.html)
+implementation to a splittable DoFn.
+* [UnboundedSource wrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L432):
+A wrapper which converts an existing [UnboundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/UnboundedSource.html)
+implementation to a splittable DoFn.
+
+**Python Examples**
+* [BoundedSourceWrapper](https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/python/apache_beam/io/iobase.py#L1375):
+A wrapper which converts an existing [BoundedSource](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.BoundedSource)
+implementation to a splittable DoFn.

Review comment:
       ```suggestion
   implementation to a splittable DoFn.
   
   **Go Examples**
    *  [textio.ReadSdf](https://github.com/apache/beam/blob/ce190e11332469ea59b6c9acf16ee7c673ccefdd/sdks/go/pkg/beam/io/textio/sdf.go#L40) implements reading from text files using a splittable DoFn.
   ```
   **




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lostluck commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
lostluck commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-740261199


   Overall LGTM to me, modulo adding the Go SDF line.
   
   You should also link to the new section in the programming guide about how to write them. That's probably a bit more important than the examples.
   
   I'm ambivalent whether this comes out before or after the 2.26.0 release. If I make the RC1 artifacts tonight, then we can probably have it out out by friday the 11th, and publish both blogs in sequence, (release, SDF). If it's not conditional, then there shouldn't be any issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] youngoli commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
youngoli commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r539644754



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.

Review comment:
       It's moreso to help the flow of the post. It makes sense to go background info -> main point -> details. Plus, from my experience it's relatively common for articles/blog posts to restate the title. So while the title is great, I still think having this sentence helps.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742818601


   > I think it looks right. Have you added yourself to the authors data?
   > 
   > website/www/site/data/authors.yml
   
   Yeah, I'm there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lostluck edited a comment on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
lostluck edited a comment on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-740261199






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tysonjh commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
tysonjh commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r539452514



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.
+* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting.
+  - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance
+  benefits from splitting strategies, which limits many real-world usages. This is no longer a limit
+  for a Splittable DoFn.
+
+As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended
+way to build the new I/O connectors.Try out building your own Splittable DoFn by following the

Review comment:
       ```suggestion
   way to build the new I/O connectors. Try out building your own Splittable DoFn by following the
   ```

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.
+* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting.
+  - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance
+  benefits from splitting strategies, which limits many real-world usages. This is no longer a limit
+  for a Splittable DoFn.
+
+As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended
+way to build the new I/O connectors.Try out building your own Splittable DoFn by following the
+[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We
+have provided tones of common utility classes such as common types of `RestrictionTracker` and

Review comment:
       ```suggestion
   have provided tonnes of common utility classes such as common types of `RestrictionTracker` and
   ```

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.
+* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting.
+  - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance
+  benefits from splitting strategies, which limits many real-world usages. This is no longer a limit
+  for a Splittable DoFn.
+
+As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended
+way to build the new I/O connectors.Try out building your own Splittable DoFn by following the
+[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We
+have provided tones of common utility classes such as common types of `RestrictionTracker` and
+`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O
+connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable
+DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual
+Splittable DoFn implementation to gain more performance benefits.
+
+Many thanks to every contributor who brought this highly expected design into the data processing
+world. We are really excited to see that users benefit from Splittable DoFn.
+
+At the end, hope you enjoy exploring some real-world Splittable DoFn examples.

Review comment:
       ```suggestion
   Below are some real-world Splittable DoFn examples for you to explore.
   ```

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.

Review comment:
       +1

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.
+* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting.
+  - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance
+  benefits from splitting strategies, which limits many real-world usages. This is no longer a limit
+  for a Splittable DoFn.
+
+As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended
+way to build the new I/O connectors.Try out building your own Splittable DoFn by following the
+[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We
+have provided tones of common utility classes such as common types of `RestrictionTracker` and
+`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O
+connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable
+DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual
+Splittable DoFn implementation to gain more performance benefits.
+
+Many thanks to every contributor who brought this highly expected design into the data processing

Review comment:
       ```suggestion
   Many thanks to every contributor who brought this highly anticipated design into the data processing
   ```
   
   I think this is what you mean?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742772118


   > It's kind of weird that I cannot find this new blog post from staging website. Is it because i set it to a future date? Did I miss anything? @rosetn
   
   I think future date is the cause. So I'll set 12/14(next Monday) as target date.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz merged pull request #13456: [BEAM-10480] Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz merged pull request #13456:
URL: https://github.com/apache/beam/pull/13456


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r539562822



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.

Review comment:
       This blog is titled with `Splittable DoFn in Apache Beam is Ready to Use`. Does it help on this purpose?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742819656


   Thanks for all the help! I'm going to merge this PR now and it will be published on next Monday.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-740263874


   > Overall LGTM to me, modulo adding the Go SDF line.
   > 
   > You should also link to the new section in the programming guide about how to write them. That's probably as important as the examples.
   > 
   > I'm ambivalent whether this comes out before or after the 2.26.0 release. If I make the RC1 artifacts tonight, then we can probably have it out out by friday the 11th, and publish both blogs in sequence, (release, SDF). If it's not conditional, then there shouldn't be any issues.
   
   Thanks, Rebo! I have linked the sdf programming guide in the post. Publishing this blog after 2.26.0 release sounds good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rosetn commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
rosetn commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742801771


   I think it looks right. Have you added yourself to the authors data? 
   
   website/www/site/data/authors.yml


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] jkff commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
jkff commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742807522


   I cannot emphasize enough how happy I am about this PR! Thanks to everyone for the huge amount of work involved in getting SDF to general availability!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rosetn commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
rosetn commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r539711633



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,91 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Now we are pleased to announce that Splittable DoFn is ready for use in Beam Python/Java/Go SDKs

Review comment:
       I think this sentence is fine here, but I'd move it to the beginning of the post. Get to the announcement immediately and then explain the context. 
   
   Also, if you define the abbreviation Splittable DoFn (SDF), you can use it throughout the post to make it easier to read. Up to you how you want to talk about it :)
   
   "We are pleased to announce that Splittable DoFn (SDF) is ready for use in the Beam Python, Java, and Go SDKs for versions 2.25.0 and later."

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,91 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Now we are pleased to announce that Splittable DoFn is ready for use in Beam Python/Java/Go SDKs
+starting in version 2.25.0.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify

Review comment:
       I think we talk about this I/O as one word in the docs? "KafkaIO"? 

##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,91 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.
+
+Now we are pleased to announce that Splittable DoFn is ready for use in Beam Python/Java/Go SDKs
+starting in version 2.25.0.
+
+Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`:
+* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases.
+* Splittable DoFn enables reading from source descriptors dynamically.
+  - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify
+  the topic and partition you want to read from during pipeline construction time. There is no way
+  for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution
+  time. But it's native to Splittable DoFn.

Review comment:
       Replace "native" with "built-in" (if that what you mean)
   
   https://developers.google.com/style/word-list#letter-n




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r533842262



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,58 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+

Review comment:
       It is a bit spartan. Maybe add a little bit of a story, or thank a people who contributed etc.
   
   I will defer to Rose for the review.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-742181313


   It's kind of weird that I cannot find this new blog post from staging website. Is it because i set it to a future date? Did I miss anything? @rosetn 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r535804631



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,58 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Now Splittable DoFn is ready to use. Try out building your Splittable DoFn

Review comment:
       Just updated the blog with more details. Would you like to take another look?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r539667707



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.

Review comment:
       Thanks! Just updated the blog.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] youngoli commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
youngoli commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r537966854



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,88 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core
+capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of
+coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable
+code.

Review comment:
       I think an additional sentence here outlining the goal of the blog post would help. Something like:
   
   "Thanks to the hard work of many contributors, we are pleased to announce that Splittable DoFn is ready for use in all Beam SDKs starting in version 2.XX."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-740155784


   Kindly pinging


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tysonjh commented on a change in pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
tysonjh commented on a change in pull request #13456:
URL: https://github.com/apache/beam/pull/13456#discussion_r534335179



##########
File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md
##########
@@ -0,0 +1,58 @@
+---
+title:  "Splittable DoFn in Apache Beam is Ready to Use"
+date:   2020-12-16 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2020/12/16/splittable-do-fn-is-available.html
+authors:
+  - boyuanzz
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed
+to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of
+building I/O connectors. Now Splittable DoFn is ready to use. Try out building your Splittable DoFn

Review comment:
       Summarize the goals of SDF, romanticize it, and tell users why they should be excited to use the new feature. Some questions you should answer here:
    
     * Why should I prefer SDF over the 'old' I/O connectors? (new capabilities? better perf? cleaner api?)
     * Is it worth migrating my existing i/o connector to SDF? Why?
     * What's next for SDFs? (optional)
     * Why did it take 3 years to do this? (optional)
     * Is there a specific place/component/label to report bugs? (optional)
   
   These details may duplicate content from the other blog posts but that is fine. A high level, short summary, of them would help me avoid having to sift through the links.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lostluck commented on pull request #13456: Add a small announcement for Splittable DoFn.

Posted by GitBox <gi...@apache.org>.
lostluck commented on pull request #13456:
URL: https://github.com/apache/beam/pull/13456#issuecomment-740266386


   Ah! so you did. I've been doing too much reading today and missed it. Thank you!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org