You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/08/20 03:42:11 UTC

[GitHub] [flink] hequn8128 opened a new pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

hequn8128 opened a new pull request #13203:
URL: https://github.com/apache/flink/pull/13203


   
   ## What is the purpose of the change
   
   This pull request adds tutorial documentation for Python DataStream API.
   
   
   ## Brief change log
   
     - Adds tutorial documentation for Python DataStream API.
     - Add pointer of Python DataStream API in Try Flink. 
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-682003680


   @morsapaes Hi, thanks a lot for your suggestions. I have addressed the comments and updated the PR with three separate commits.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot edited a comment on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     }, {
       "hash" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5934",
       "triggerID" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7170e6d063f5c8e7ddf4104d4beca360be3d8136 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734) 
   * 688923779cd4c4a10c3f05d7a5aa030ad4434785 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5934) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on a change in pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on a change in pull request #13203:
URL: https://github.com/apache/flink/pull/13203#discussion_r478443841



##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program

Review comment:
       PyFlink includes Python DataStream API, Python Table API, etc., in Flink.  :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on a change in pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on a change in pull request #13203:
URL: https://github.com/apache/flink/pull/13203#discussion_r478430827



##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.

Review comment:
       Good idea.
   How about replacing the `stateful` keyword to `simple` in this page since the application below contains no stateful processing? Currently, we only support stateless `DataStream` processing for Python. The stateful part will be supported later. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot edited a comment on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7170e6d063f5c8e7ddf4104d4beca360be3d8136 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 merged pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 merged pull request #13203:
URL: https://github.com/apache/flink/pull/13203


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot edited a comment on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     }, {
       "hash" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7170e6d063f5c8e7ddf4104d4beca360be3d8136 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734) 
   * 688923779cd4c4a10c3f05d7a5aa030ad4434785 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-684804665


   @morsapaes Thank you! :) 
   @sjwiesman Do you also want to take a look? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] morsapaes commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

morsapaes commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-678426366


   👌🏻


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] morsapaes commented on a change in pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

morsapaes commented on a change in pull request #13203:
URL: https://github.com/apache/flink/pull/13203#discussion_r476650361



##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.

Review comment:
       ```suggestion
   You can now perform transformations on this data stream, or just write the data to an external system using a _sink_. This walkthrough uses the `StreamingFileSink` sink connector to write the data into a file in the `/tmp/output` directory.
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.

Review comment:
       Installation with `pip` is pretty straightforward, so why not just add this to the tutorial instead of making the user go to a different page?
   
   If we are restructuring these anyways, I'd suggest to follow the same structure as the existing tutorials: https://ci.apache.org/projects/flink/flink-docs-release-1.11/try-flink/datastream_api.html

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program
+Firstly, make sure the output directory is not existed:

Review comment:
       ```suggestion
   Now that you defined your PyFlink program, you can run it! First, make sure that the output directory doesn't exist:
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.

Review comment:
       ```suggestion
   The last step is to execute the actual PyFlink DataStream API job. PyFlink applications are built lazily and shipped to the cluster for execution only once fully formed. To execute an application, you simply call `env.execute(job_name)`.
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program

Review comment:
       Is there a reason to use "Flink Python" instead of PyFlink (the question applies to the whole walkthrough)?

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program
+Firstly, make sure the output directory is not existed:
+
+{% highlight bash %}
+rm -rf /tmp/output
+{% endhighlight %}
+
+Next, you can run this example on the command line:
+
+{% highlight bash %}
+$ python datastream_tutorial.py
+{% endhighlight %}
+
+The command builds and runs the Python DataStream API program in a local mini cluster.
+You can also submit the Python DataStream API program to a remote cluster, you can refer
+[Job Submission Examples]({{ site.baseurl }}/ops/cli.html#job-submission-examples)
+for more details.

Review comment:
       ```suggestion
   The command builds and runs your PyFlink program in a local mini cluster. You can alternatively submit it to a remote cluster using the instructions detailed in [Job Submission Examples]({{ site.baseurl }}/ops/cli.html#job-submission-examples).
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.

Review comment:
       Some context on what is the DataStream API + the same for the Table API tutorial would be helpful. There is a nice snippet from the "Fraud Detection with the DataStream API" tutorial that could be used here, like:
   
   _"Apache Flink offers a DataStream API for building robust, stateful streaming applications. It provides fine-grained control over state and time, which allows for the implementation of advanced event-driven systems. In this step-by-step guide, you’ll learn how to build a stateful streaming application with PyFlink and the DataStream API."_
   
   (In the same way, the Table API tutorial can use the introduction from "Real Time Reporting with the Table API".)

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.

Review comment:
       ```suggestion
   DataStream API applications begin by declaring an execution environment (`StreamExecutionEnvironment`), the context in which a streaming program is executed. This is what you will use to set the properties of your job (e.g. default parallelism, restart strategy), create your sources and finally trigger the execution of the job.
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.

Review comment:
       ```suggestion
   Once a `StreamExecutionEnvironment` is created, you can use it to declare your _source_. Sources ingest data from external systems, such as Apache Kafka, Rabbit MQ, or Apache Pulsar, into Flink Jobs. 
   
   To keep things simple, this walkthrough uses a source that is backed by a collection of elements.
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.

Review comment:
       ```suggestion
   This creates a data stream from the given collection, with the same type as that of the elements in it (here, a `ROW` type with a INT field and a STRING field).
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],

Review comment:
       I get that this is reusing existing sample code, but it'd be nice to evolve the example to use a more relevant use case in the future. 
   
   (This is actually a reminder to myself, as I get my hands in PyFlink. 🙃 )

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program
+Firstly, make sure the output directory is not existed:
+
+{% highlight bash %}
+rm -rf /tmp/output
+{% endhighlight %}
+
+Next, you can run this example on the command line:
+
+{% highlight bash %}
+$ python datastream_tutorial.py
+{% endhighlight %}
+
+The command builds and runs the Python DataStream API program in a local mini cluster.
+You can also submit the Python DataStream API program to a remote cluster, you can refer
+[Job Submission Examples]({{ site.baseurl }}/ops/cli.html#job-submission-examples)
+for more details.
+
+Finally, you can see the execution result on the command line:
+
+{% highlight bash %}
+$ find /tmp/output -type f -exec cat {} \;
+1,aaa
+2,bbb
+{% endhighlight %}
+
+This should get you started with writing your own Flink Python DataStream API programs.
+To learn more about the Python DataStream API, you can refer
+[Flink Python API Docs]({{ site.pythondocs_baseurl }}/api/python) for more details.

Review comment:
       ```suggestion
   This walkthrough gives you the foundations to get started writing your own PyFlink DataStream API programs. To learn more about the Python DataStream API, please refer to [Flink Python API Docs]({{ site.pythondocs_baseurl }}/api/python) for more details.
   ```

##########
File path: docs/dev/python/getting-started/tutorial/datastream_tutorial.md
##########
@@ -0,0 +1,126 @@
+---
+title: "Python DataStream API Tutorial"
+nav-parent_id: python_tutorial
+nav-pos: 30
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This walkthrough will quickly get you started building a pure Python Flink DataStream project.
+
+Please refer to the PyFlink [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html) on how to set up the Python execution environments.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Setting up a Python Project
+
+You can begin by creating a Python project and installing the PyFlink package following the [installation guide]({{ site.baseurl }}/dev/python/getting-started/installation.html#installation-of-pyflink).
+
+## Writing a Flink Python DataStream API Program
+
+DataStream API applications begin by declaring a `StreamExecutionEnvironment`.
+This is the context in which a streaming program is executed.
+It can be used for setting execution parameters such as restart strategy, default parallelism, etc.
+
+{% highlight python %}
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+{% endhighlight %}
+
+Once a `StreamExecutionEnvironment` created, you can declare your source with it.
+
+{% highlight python %}
+ds = env.from_collection(
+    collection=[(1, 'aaa'), (2, 'bbb')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+{% endhighlight %}
+
+This creates a data stream from the given collection. The type is that of the elements in the collection. In this example, the type is a Row type with two fields. The type of the first field is integer type while the second is string type.
+
+You can now perform transformations on the datastream or writes the data into external system with sink.
+
+{% highlight python %}
+ds.add_sink(StreamingFileSink
+    .for_row_format('/tmp/output', SimpleStringEncoder())
+    .build())
+{% endhighlight %}
+
+Finally you must execute the actual Flink Python DataStream API job.
+All operations, such as creating sources, transformations and sinks are lazy.
+Only when `env.execute(job_name)` is called will runs the job.
+
+{% highlight python %}
+env.execute("tutorial_job")
+{% endhighlight %}
+
+The complete code so far:
+
+{% highlight python %}
+from pyflink.common.serialization import SimpleStringEncoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import StreamingFileSink
+
+
+def tutorial():
+    env = StreamExecutionEnvironment.get_execution_environment()
+    env.set_parallelism(1)
+    ds = env.from_collection(
+        collection=[(1, 'aaa'), (2, 'bbb')],
+        type_info=Types.ROW([Types.INT(), Types.STRING()]))
+    ds.add_sink(StreamingFileSink
+                .for_row_format('/tmp/output', SimpleStringEncoder())
+                .build())
+    env.execute("tutorial_job")
+
+
+if __name__ == '__main__':
+    tutorial()
+{% endhighlight %}
+
+## Executing a Flink Python DataStream API Program
+Firstly, make sure the output directory is not existed:
+
+{% highlight bash %}
+rm -rf /tmp/output
+{% endhighlight %}
+
+Next, you can run this example on the command line:

Review comment:
       ```suggestion
   Next, you can run the example you just created on the command line:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-678613482


   Thank you! @sjwiesman @morsapaes 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676964944


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 7170e6d063f5c8e7ddf4104d4beca360be3d8136 (Thu Aug 20 03:45:00 UTC 2020)
   
    ✅no warnings
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] hequn8128 commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

hequn8128 commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676961064


   CC @sjwiesman @shuiqiangchen 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot edited a comment on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7170e6d063f5c8e7ddf4104d4beca360be3d8136 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7170e6d063f5c8e7ddf4104d4beca360be3d8136 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] sjwiesman commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

sjwiesman commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-678424666


   I'm going to be on vacation for the next week, maybe @morsapaes can take a look at this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] morsapaes commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

morsapaes commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-684512433


   LGTM, @hequn8128, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

Posted by GitBox <gi...@apache.org>.

flinkbot edited a comment on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-676985457


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5734",
       "triggerID" : "7170e6d063f5c8e7ddf4104d4beca360be3d8136",
       "triggerType" : "PUSH"
     }, {
       "hash" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5934",
       "triggerID" : "688923779cd4c4a10c3f05d7a5aa030ad4434785",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 688923779cd4c4a10c3f05d7a5aa030ad4434785 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5934) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org