You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2020/06/22 05:15:34 UTC

[zeppelin] branch master updated: [ZEPPELIN-4839]. Update flink interpreter doc

This is an automated email from the ASF dual-hosted git repository.

zjffdu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zeppelin.git


The following commit(s) were added to refs/heads/master by this push:
     new b09d57e  [ZEPPELIN-4839]. Update flink interpreter doc
b09d57e is described below

commit b09d57e99f4cbdc9d1a1d71f14b53998424b82fc
Author: Jeff Zhang <zj...@apache.org>
AuthorDate: Thu May 28 10:39:25 2020 +0800

    [ZEPPELIN-4839]. Update flink interpreter doc
    
    ### What is this PR for?
    
    This PR update the flink interpreter doc for the recently improvement in flink interpreter and add some screenshot.
    
    ### What type of PR is it?
    [Documentation]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-4839
    
    ### How should this be tested?
    * CI pass
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    Author: Jeff Zhang <zj...@apache.org>
    
    Closes #3779 from zjffdu/ZEPPELIN-4839 and squashes the following commits:
    
    ddc774078 [Jeff Zhang] [ZEPPELIN-4839]. Update flink interpreter doc
---
 .../zeppelin/img/docs-img/flink_append_mode.gif    | Bin 0 -> 294307 bytes
 .../zeppelin/img/docs-img/flink_single_mode.gif    | Bin 0 -> 58198 bytes
 .../zeppelin/img/docs-img/flink_update_mode.gif    | Bin 0 -> 131055 bytes
 .../zeppelin/img/docs-img/flink_z_batch_table.png  | Bin 0 -> 189710 bytes
 .../zeppelin/img/docs-img/flink_z_dataset.png      | Bin 0 -> 160627 bytes
 .../zeppelin/img/docs-img/flink_z_stream_table.gif | Bin 0 -> 226356 bytes
 docs/interpreter/flink.md                          | 174 ++++++++++++++++-----
 7 files changed, 137 insertions(+), 37 deletions(-)

diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif b/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif
new file mode 100644
index 0000000..3c827f4
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif b/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif
new file mode 100644
index 0000000..91b49ed
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif b/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif
new file mode 100644
index 0000000..fe7e2e9
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png b/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png
new file mode 100644
index 0000000..216675e
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png b/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png
new file mode 100644
index 0000000..052b74d
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif b/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif
new file mode 100644
index 0000000..fb52e99
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif differ
diff --git a/docs/interpreter/flink.md b/docs/interpreter/flink.md
index 40cf058..108d7b5 100644
--- a/docs/interpreter/flink.md
+++ b/docs/interpreter/flink.md
@@ -26,7 +26,7 @@ limitations under the License.
 ## Overview
 [Apache Flink](https://flink.apache.org) is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.
 
-In Zeppelin 0.9, we refactor the Flink interpreter in Zeppelin to support the latest version of Flink. **Only Flink 1.10+ is supported, old version of flink may not work.**
+In Zeppelin 0.9, we refactor the Flink interpreter in Zeppelin to support the latest version of Flink. **Only Flink 1.10+ is supported, old version of flink won't work.**
 Apache Flink is supported in Zeppelin with Flink interpreter group which consists of below five interpreters.
 
 <table class="table-configuration">
@@ -65,11 +65,10 @@ Apache Flink is supported in Zeppelin with Flink interpreter group which consist
 ## Prerequisites
 
 * Download Flink 1.10 for scala 2.11 (Only scala-2.11 is supported, scala-2.12 is not supported yet in Zeppelin)
-* Download [flink-hadoop-shaded](https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2/2.8.3-10.0/flink-shaded-hadoop-2-2.8.3-10.0.jar) and put it under lib folder of flink (flink interpreter need that to support yarn mode)
 
 ## Configuration
 The Flink interpreter can be configured with properties provided by Zeppelin (as following table).
-You can also set other flink properties which are not listed in the table. For a list of additional properties, refer to [Flink Available Properties](https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html).
+You can also add and set other flink properties which are not listed in the table. For a list of additional properties, refer to [Flink Available Properties](https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html).
 <table class="table-configuration">
   <tr>
     <th>Property</th>
@@ -191,11 +190,7 @@ You can also set other flink properties which are not listed in the table. For a
     <td>true</td>
     <td>Whether display scala shell output in colorful format</td>
   </tr>
-  <tr>
-    <td>zeppelin.flink.enableHive</td>
-    <td>false</td>
-    <td>Whether enable hive</td>
-  </tr>
+
   <tr>
     <td>zeppelin.flink.enableHive</td>
     <td>false</td>
@@ -249,6 +244,16 @@ And will create 6 variables as pyflink (`%flink.pyflink` or `%flink.ipyflink`) e
 * `st_env_2`   (StreamTableEnvironment for flink planner) 
 * `bt_env_2`   (BatchTableEnvironment for flink planner)
 
+## Blink/Flink Planner
+
+There're 2 planners supported by Flink's table api: `flink` & `blink`.
+
+* If you want to use DataSet api, and convert it to flink table then please use flink planner (`btenv_2` and `stenv_2`).
+* In other cases, we would always recommend you to use `blink` planner. This is also what flink batch/streaming sql interpreter use (`%flink.bsql` & `%flink.ssql`)
+
+Check this [page](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/common.html#main-differences-between-the-two-planners) for the difference between flink planner and blink planner.
+
+
 ## Execution mode (Local/Remote/Yarn)
 
 Flink in Zeppelin supports 3 execution modes (`flink.execution.mode`):
@@ -274,15 +279,7 @@ In order to run flink in Yarn mode, you need to make the following settings:
 
 * Set `flink.execution.mode` to `yarn`
 * Set `HADOOP_CONF_DIR` in flink's interpreter setting.
-* Make sure `hadoop` command is your PATH. Because internally flink will call command `hadoop classpath` and load all the hadoop related jars in the flink interpreter process
-
-
-## Blink/Flink Planner
-
-There're 2 planners supported by Flink's table api: `flink` & `blink`.
-
-* If you want to use DataSet api, and convert it to flink table then please use flink planner (`btenv_2` and `stenv_2`).
-* In other cases, we would always recommend you to use `blink` planner. This is also what flink batch/streaming sql interpreter use (`%flink.bsql` & `%flink.ssql`)
+* Make sure `hadoop` command is on your PATH. Because internally flink will call command `hadoop classpath` and load all the hadoop related jars in the flink interpreter process
 
 
 ## How to use Hive
@@ -291,31 +288,63 @@ In order to use Hive in Flink, you have to make the following setting.
 
 * Set `zeppelin.flink.enableHive` to be true
 * Set `zeppelin.flink.hive.version` to be the hive version you are using.
-* Set `HIVE_CONF_DIR` to be the location where `hive-site.xml` is located. Make sure hive metastore is started and you have configure `hive.metastore.uris` in `hive-site.xml`
+* Set `HIVE_CONF_DIR` to be the location where `hive-site.xml` is located. Make sure hive metastore is started and you have configured `hive.metastore.uris` in `hive-site.xml`
 * Copy the following dependencies to the lib folder of flink installation. 
     * flink-connector-hive_2.11–1.10.0.jar
     * flink-hadoop-compatibility_2.11–1.10.0.jar
     * hive-exec-2.x.jar (for hive 1.x, you need to copy hive-exec-1.x.jar, hive-metastore-1.x.jar, libfb303–0.9.2.jar and libthrift-0.9.2.jar)
 
-After these settings, you will be able to query hive table via either table api `%flink` or batch sql `%flink.bsql`
 
 ## Flink Batch SQL
 
-`%flink.bsql` is used for flink's batch sql. You just type `help` to get all the available commands.
+`%flink.bsql` is used for flink's batch sql. You can type `help` to get all the available commands. 
+It supports all the flink sql, including DML/DDL/DQL.
 
 * Use `insert into` statement for batch ETL
-* Use `select` statement for exploratory data analytics 
+* Use `select` statement for batch data analytics 
 
 ## Flink Streaming SQL
 
-`%flink.ssql` is used for flink's streaming sql. You just type `help` to get all the available commands. Mainlly there're 2 cases:
+`%flink.ssql` is used for flink's streaming sql. You just type `help` to get all the available commands. 
+It supports all the flink sql, including DML/DDL/DQL.
 
-* Use `insert into` statement for streaming processing
+* Use `insert into` statement for streaming ETL
 * Use `select` statement for streaming data analytics
 
+## Streaming Data Visualization
+
+Zeppelin supports 3 types of streaming data analytics:
+* Single
+* Update
+* Append
+
+### type=single
+Single mode is for the case when the result of sql statement is always one row, such as the following example. The output format is HTML, 
+and you can specify paragraph local property `template` for the final output content template. 
+And you can use `{i}` as placeholder for the ith column of result.
+
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif)
+ </center>
+
+### type=update
+Update mode is suitable for the case when the output is more than one rows, and always will be updated continuously. 
+Here’s one example where we use group by.
+
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif)
+ </center>
+
+### type=append
+Append mode is suitable for the scenario where output data is always appended. E.g. the following example which use tumble window.
+
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif)
+ </center>
+ 
 ## Flink UDF
 
-You can use Flink scala UDF or Python UDF in sql. UDF for batch and streaming sql is the same. Here's 2 examples.
+You can use Flink scala UDF or Python UDF in sql. UDF for batch and streaming sql is the same. Here're 2 examples.
 
 * Scala UDF
 
@@ -343,29 +372,100 @@ bt_env.register_function("python_upper", udf(PythonUpper(), DataTypes.STRING(),
 
 ```
 
-Besides defining udf in Zeppelin, you can also load udfs in jars via `flink.udf.jars`. For example, you can create
-udfs in intellij and then build these udfs in one jar. After that you can specify `flink.udf.jars` to this jar, and flink
+Zeppelin only supports scala and python for flink interpreter, if you want to write java udf or the udf is pretty complicated which make it not suitable to write in Zeppelin,
+then you can write the udf in IDE and build a udf jar.
+In Zeppelin you just need to specify `flink.udf.jars` to this jar, and flink
 interpreter will detect all the udfs in this jar and register all the udfs to TableEnvironment, the udf name is the class name.
 
-## ZeppelinContext
-Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
-See [Zeppelin-Context](../usage/other_features/zeppelin_context.html) for more details.
+## PyFlink(%flink.pyflink)
+In order to use PyFlink in Zeppelin, you just need to do the following configuration.
+* Install apache-flink (e.g. pip install apache-flink)
+* Set `zeppelin.pyflink.python` to the python executable where apache-flink is installed in case you have multiple python installed.
+* Copy flink-python_2.11–1.10.0.jar from flink opt folder to flink lib folder
+
+And PyFlink will create 6 variables for you:
+
+* `s_env`    (StreamExecutionEnvironment), 
+* `b_env`     (ExecutionEnvironment)
+* `st_env`   (StreamTableEnvironment for blink planner) 
+* `bt_env`   (BatchTableEnvironment for blink planner)
+* `st_env_2`   (StreamTableEnvironment for flink planner) 
+* `bt_env_2`   (BatchTableEnvironment for flink planner)
 
-## IPython Support
+### IPython Support(%flink.ipyflink)
 
 By default, zeppelin would use IPython in `%flink.pyflink` when IPython is available, Otherwise it would fall back to the original python implementation.
 For the IPython features, you can refer doc[Python Interpreter](python.html)
 
-## Tutorial Notes
+## ZeppelinContext
+Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
+See [Zeppelin-Context](../usage/other_features/zeppelin_context.html) for more details. You can use `z` to display both flink DataSet and batch/stream table.
+
+* Display DataSet
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png)
+ </center>
+ 
+* Display Batch Table
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png)
+ </center>
+* Display Stream Table
+ <center>
+   ![Interactive Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif)
+ </center>
+
+## Paragraph local properties
+
+In the section of `Streaming Data Visualization`, we demonstrate the different visualization type via paragraph local properties: `type`. 
+In this section, we will list and explain all the supported local properties in flink interpreter.
 
-Zeppelin is shipped with several Flink tutorial notes which may be helpful for you. Except the first one, the below 4 notes cover the 4 main scenarios of flink.
+<table class="table-configuration">
+  <tr>
+    <th>Property</th>
+    <th>Default</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>type</td>
+    <td></td>
+    <td>Used in %flink.ssql to specify the streaming visualization type (single, update, append)</td>
+  </tr>
+  <tr>
+    <td>refreshInterval</td>
+    <td>3000</td>
+    <td>Used in `%flink.ssql` to specify frontend refresh interval for streaming data visualization.</td>
+  </tr>
+  <tr>
+    <td>template</td>
+    <td>{0}</td>
+    <td>Used in `%flink.ssql` to specify html template for `single` type of streaming data visualization, And you can use `{i}` as placeholder for the {i}th column of the result.</td>
+  </tr>
+  <tr>
+    <td>parallelism</td>
+    <td></td>
+    <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job parallelism</td>
+  </tr>
+  <tr>
+    <td>maxParallelism</td>
+    <td></td>
+    <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job max parallelism in case you want to change parallelism later. For more details, refer this [link](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/parallel.html#setting-the-maximum-parallelism) </td>
+  </tr>
+  <tr>
+    <td>savepointDir</td>
+    <td></td>
+    <td>If you specify it, then when you cancel your flink job in Zeppelin, it would also do savepoint and store state in this directory. And when you resume your job, it would resume from this savepoint.</td>
+  </tr>
+  <tr>
+    <td>runAsOne</td>
+    <td>false</td>
+    <td>All the insert into sql will run in a single flink job if this is true.</td>
+  </tr>
+</table>
 
-* Flink Basic
-* Batch ETL
-* Exploratory Data Analytics
-* Streaming ETL
-* Streaming Data Analytics
+## Tutorial Notes
 
+Zeppelin is shipped with several Flink tutorial notes which may be helpful for you. You check more features in the tutorial notes.