You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by zh...@apache.org on 2022/02/12 12:43:42 UTC

[dolphinscheduler-website] branch master updated: [Feature-8024][Document] Add example and notice about task type Flink (#688)

This is an automated email from the ASF dual-hosted git repository.

zhongjiajie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 938c814  [Feature-8024][Document] Add example and notice about task type Flink (#688)
938c814 is described below

commit 938c8144c7e1881c94f17674c6512088ffedd046
Author: QuakeWang <45...@users.noreply.github.com>
AuthorDate: Sat Feb 12 20:43:35 2022 +0800

    [Feature-8024][Document] Add example and notice about task type Flink (#688)
---
 docs/en-us/2.0.3/user_doc/guide/task/flink.md |  84 +++++++++++++++++++-------
 docs/en-us/dev/user_doc/guide/task/flink.md   |  84 +++++++++++++++++++-------
 docs/zh-cn/2.0.3/user_doc/guide/task/flink.md |  82 ++++++++++++++++++-------
 docs/zh-cn/dev/user_doc/guide/task/flink.md   |  82 ++++++++++++++++++-------
 img/tasks/demo/flink_task.png                 | Bin 0 -> 247959 bytes
 img/tasks/demo/upload_flink.png               | Bin 0 -> 106645 bytes
 img/tasks/icons/flink.png                     | Bin 0 -> 1443 bytes
 7 files changed, 248 insertions(+), 84 deletions(-)

diff --git a/docs/en-us/2.0.3/user_doc/guide/task/flink.md b/docs/en-us/2.0.3/user_doc/guide/task/flink.md
index 88e29ba..18c15f0 100644
--- a/docs/en-us/2.0.3/user_doc/guide/task/flink.md
+++ b/docs/en-us/2.0.3/user_doc/guide/task/flink.md
@@ -1,23 +1,65 @@
-
 # Flink
 
-- Drag in the toolbar<img src="/img/flink.png" width="35"/>The task node to the drawing board, as shown in the following figure:
-
-<p align="center">
-  <img src="/img/flink-en.png" width="80%" />
-</p>
-
-- Program type: supports JAVA, Scala and Python three languages
-- The class of the main function: is the full path of the Main Class, the entry point of the Flink program
-- Main jar package: is the Flink jar package
-- Deployment mode: support three modes of cluster and local
-- Number of slots: You can set the number of slots
-- Number of taskManage: You can set the number of taskManage
-- JobManager memory number: You can set the jobManager memory number
-- TaskManager memory number: You can set the taskManager memory number
-- Command line parameters: Set the input parameters of the Flink program and support the substitution of custom parameter variables.
-- Other parameters: support --jars, --files, --archives, --conf format
-- Resource: If the resource file is referenced in other parameters, you need to select and specify in the resource
-- Custom parameter: It is a local user-defined parameter of Flink, which will replace the content with \${variable} in the script
-
-Note: JAVA and Scala are only used for identification, there is no difference, if it is Flink developed by Python, there is no class of the main function, the others are the same
+## Overview
+
+Flink task type for executing Flink programs. For Flink nodes, the worker submits the task by using the flink command `flink run`. See [flink cli](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/) for more details.
+
+## Create task
+
+- Click Project Management -> Project Name -> Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
+- Drag the <img src="/img/tasks/icons/flink.png" width="15"/> from the toolbar to the drawing board.
+
+## Task Parameter
+
+- **Node name**: The node name in a workflow definition is unique.
+- **Run flag**: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
+- **Descriptive information**: describe the function of the node.
+- **Task priority**: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
+- **Worker grouping**: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
+- **Environment Name**: Configure the environment name in which to run the script.
+- **Number of failed retry attempts**: The number of times the task failed to be resubmitted.
+- **Failed retry interval**: The time, in cents, interval for resubmitting the task after a failed task.
+- **Delayed execution time**: the time, in cents, that a task is delayed in execution.
+- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
+- **Program type**: supports Java、Scala and Python.
+- **The class of main function**: is the full path of Main Class, the entry point of the Flink program.
+- **Resource**: Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management.
+- **Main jar package**: is the Flink jar package.
+- **Deployment mode**: support three modes of cluster and local 
+- **Task name** (option): Flink task name.
+- **jobManager memory number**: This is used to set the number of jobManager memories, which can be set according to the actual production environment.
+- **Number of slots**: This is used to set the number of Slots, which can be set according to the actual production environment.
+- **taskManager memory number**: This is used to set the number of taskManager memories, which can be set according to the actual production environment.
+- **Number of taskManage**: This is used to set the number of taskManagers, which can be set according to the actual production environment.
+- **Custom parameters**: It is a user-defined parameter that is part of MapReduce, which will replace the content with ${variable} in the script.
+- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
+- **Parallelism**: Used to set the degree of parallelism for executing Flink tasks.
+- **Main program parameters**: et the input parameters of the Flink program and support the substitution of custom parameter variables.
+- **Other parameters**: support `--jars`, `--files`,` --archives`, `--conf` format.
+- **Resource**: If the resource file is referenced in other parameters, you need to select and specify in the resource.
+- **Custom parameter**: It is a local user-defined parameter of Flink, which will replace the content with ${variable} in the script.
+- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
+
+## Task Example
+
+### Execute the WordCount program
+
+This is a common introductory case in the Big Data ecosystem, which often applied to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases come with this example job)
+
+#### Uploading the main package
+
+When using the Flink task node, you will need to use the Resource Centre to upload the jar package for the executable. Refer to the [resource center](../resource.md).
+
+After configuring the Resource Centre, you can upload the required target files directly using drag and drop.
+
+![resource_upload](/img/tasks/demo/upload_flink.png)
+
+#### Configuring Flink nodes
+
+Simply configure the required content according to the parameter descriptions above.
+
+![demo-flink-simple](/img/tasks/demo/flink_task.png)
+
+## Notice
+
+ JAVA and Scala are only used for identification, there is no difference, if it is Flink developed by Python, there is no class of the main function, the others are the same.
diff --git a/docs/en-us/dev/user_doc/guide/task/flink.md b/docs/en-us/dev/user_doc/guide/task/flink.md
index 88e29ba..18c15f0 100644
--- a/docs/en-us/dev/user_doc/guide/task/flink.md
+++ b/docs/en-us/dev/user_doc/guide/task/flink.md
@@ -1,23 +1,65 @@
-
 # Flink
 
-- Drag in the toolbar<img src="/img/flink.png" width="35"/>The task node to the drawing board, as shown in the following figure:
-
-<p align="center">
-  <img src="/img/flink-en.png" width="80%" />
-</p>
-
-- Program type: supports JAVA, Scala and Python three languages
-- The class of the main function: is the full path of the Main Class, the entry point of the Flink program
-- Main jar package: is the Flink jar package
-- Deployment mode: support three modes of cluster and local
-- Number of slots: You can set the number of slots
-- Number of taskManage: You can set the number of taskManage
-- JobManager memory number: You can set the jobManager memory number
-- TaskManager memory number: You can set the taskManager memory number
-- Command line parameters: Set the input parameters of the Flink program and support the substitution of custom parameter variables.
-- Other parameters: support --jars, --files, --archives, --conf format
-- Resource: If the resource file is referenced in other parameters, you need to select and specify in the resource
-- Custom parameter: It is a local user-defined parameter of Flink, which will replace the content with \${variable} in the script
-
-Note: JAVA and Scala are only used for identification, there is no difference, if it is Flink developed by Python, there is no class of the main function, the others are the same
+## Overview
+
+Flink task type for executing Flink programs. For Flink nodes, the worker submits the task by using the flink command `flink run`. See [flink cli](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/) for more details.
+
+## Create task
+
+- Click Project Management -> Project Name -> Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
+- Drag the <img src="/img/tasks/icons/flink.png" width="15"/> from the toolbar to the drawing board.
+
+## Task Parameter
+
+- **Node name**: The node name in a workflow definition is unique.
+- **Run flag**: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
+- **Descriptive information**: describe the function of the node.
+- **Task priority**: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
+- **Worker grouping**: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
+- **Environment Name**: Configure the environment name in which to run the script.
+- **Number of failed retry attempts**: The number of times the task failed to be resubmitted.
+- **Failed retry interval**: The time, in cents, interval for resubmitting the task after a failed task.
+- **Delayed execution time**: the time, in cents, that a task is delayed in execution.
+- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
+- **Program type**: supports Java、Scala and Python.
+- **The class of main function**: is the full path of Main Class, the entry point of the Flink program.
+- **Resource**: Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management.
+- **Main jar package**: is the Flink jar package.
+- **Deployment mode**: support three modes of cluster and local 
+- **Task name** (option): Flink task name.
+- **jobManager memory number**: This is used to set the number of jobManager memories, which can be set according to the actual production environment.
+- **Number of slots**: This is used to set the number of Slots, which can be set according to the actual production environment.
+- **taskManager memory number**: This is used to set the number of taskManager memories, which can be set according to the actual production environment.
+- **Number of taskManage**: This is used to set the number of taskManagers, which can be set according to the actual production environment.
+- **Custom parameters**: It is a user-defined parameter that is part of MapReduce, which will replace the content with ${variable} in the script.
+- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
+- **Parallelism**: Used to set the degree of parallelism for executing Flink tasks.
+- **Main program parameters**: et the input parameters of the Flink program and support the substitution of custom parameter variables.
+- **Other parameters**: support `--jars`, `--files`,` --archives`, `--conf` format.
+- **Resource**: If the resource file is referenced in other parameters, you need to select and specify in the resource.
+- **Custom parameter**: It is a local user-defined parameter of Flink, which will replace the content with ${variable} in the script.
+- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
+
+## Task Example
+
+### Execute the WordCount program
+
+This is a common introductory case in the Big Data ecosystem, which often applied to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases come with this example job)
+
+#### Uploading the main package
+
+When using the Flink task node, you will need to use the Resource Centre to upload the jar package for the executable. Refer to the [resource center](../resource.md).
+
+After configuring the Resource Centre, you can upload the required target files directly using drag and drop.
+
+![resource_upload](/img/tasks/demo/upload_flink.png)
+
+#### Configuring Flink nodes
+
+Simply configure the required content according to the parameter descriptions above.
+
+![demo-flink-simple](/img/tasks/demo/flink_task.png)
+
+## Notice
+
+ JAVA and Scala are only used for identification, there is no difference, if it is Flink developed by Python, there is no class of the main function, the others are the same.
diff --git a/docs/zh-cn/2.0.3/user_doc/guide/task/flink.md b/docs/zh-cn/2.0.3/user_doc/guide/task/flink.md
index 641a37e..b972a0c 100644
--- a/docs/zh-cn/2.0.3/user_doc/guide/task/flink.md
+++ b/docs/zh-cn/2.0.3/user_doc/guide/task/flink.md
@@ -1,23 +1,63 @@
 # Flink节点
 
-- 拖动工具栏中的<img src="/img/flink.png" width="35"/>任务节点到画板中,如下图所示:
-
-<p align="center">
-  <img src="/img/flink_edit.png" width="80%" />
-</p>
-
-
-- 程序类型:支持JAVA、Scala和Python三种语言
-- 主函数的class:是Flink程序的入口Main Class的全路径
-- 主jar包:是Flink的jar包
-- 部署方式:支持cluster、local三种模式
-- slot数量:可以设置slot数
-- taskManage数量:可以设置taskManage数
-- jobManager内存数:可以设置jobManager内存数
-- taskManager内存数:可以设置taskManager内存数
-- 命令行参数:是设置Flink程序的输入参数,支持自定义参数变量的替换。
-- 其他参数:支持 --jars、--files、--archives、--conf格式
-- 资源:如果其他参数中引用了资源文件,需要在资源中选择指定
-- 自定义参数:是Flink局部的用户自定义参数,会替换脚本中以${变量}的内容
-
-注意:JAVA和Scala只是用来标识,没有区别,如果是Python开发的Flink则没有主函数的class,其他都是一样
+## 综述
+
+Flink 任务类型,用于执行 Flink 程序。对于 Flink 节点,worker 会通过使用 flink 命令 `flink run` 的方式提交任务。更多详情查看 [flink cli](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/)。
+
+## 创建任务
+
+- 点击项目管理-项目名称-工作流定义,点击“创建工作流”按钮,进入 DAG 编辑页面;
+- 拖动工具栏的 <img src="/img/tasks/icons/flink.png" width="15"/> 任务节点到画板中。
+
+## 任务参数
+
+- 节点名称:设置任务的名称。一个工作流定义中的节点名称是唯一的。
+- 运行标志:标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。
+- 描述:描述该节点的功能。
+- 任务优先级:worker 线程数不足时,根据优先级从高到低依次执行,优先级一样时根据先进先出原则执行。
+- Worker 分组:任务分配给 worker 组的机器执行,选择 Default,会随机选择一台 worker 机执行。
+- 环境名称:配置运行脚本的环境。
+- 失败重试次数:任务失败重新提交的次数。
+- 失败重试间隔:任务失败重新提交任务的时间间隔,以分为单位。
+- 延迟执行时间:任务延迟执行的时间,以分为单位。
+- 超时告警:勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
+- 程序类型:支持 Java、Scala 和 Python 三种语言。
+- 主函数的 Class:Flink 程序的入口 Main Class 的**全路径**。
+- 主程序包:执行 Flink 程序的 jar 包(通过资源中心上传)。
+- 部署方式:支持 cluster 和 local 两种模式的部署。
+- Flink 版本:根据所需环境选择对应的版本即可。
+- 任务名称(选填):Flink 程序的名称。
+- jobManager 内存数:用于设置 jobManager 内存数,可根据实际生产环境设置对应的内存数。
+- Slot 数量:用于设置 Slot 的数量,可根据实际生产环境设置对应的数量。
+- taskManager 内存数:用于设置 taskManager 内存数,可根据实际生产环境设置对应的内存数。
+- taskManager 数量:用于设置 taskManager 的数量,可根据实际生产环境设置对应的数量。
+- 并行度:用于设置执行 Flink 任务的并行度。
+- 主程序参数:设置 Flink 程序的输入参数,支持自定义参数变量的替换。
+- 选项参数:支持 `--jar`、`--files`、`--archives`、`--conf` 格式。
+- 资源:如果其他参数中引用了资源文件,需要在资源中选择指定。
+- 自定义参数:是 Flink 局部的用户自定义参数,会替换脚本中以 ${变量} 的内容
+- 前置任务:选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
+
+## 任务样例
+
+### 执行 WordCount 程序
+
+本案例为大数据生态中常见的入门案例,常应用于 MapReduce、Flink、Spark 等计算框架。主要为统计输入的文本中,相同的单词的数量有多少。(Flink 的 Releases 附带了此示例作业)
+
+####  上传主程序包
+
+在使用 Flink 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource.md)。
+
+当配置完成资源中心之后,直接使用拖拽的方式,即可上传所需目标文件。
+
+![resource_upload](/img/tasks/demo/upload_flink.png)
+
+#### 配置 Flink 节点
+
+根据上述参数说明,配置所需的内容即可。
+
+![demo-flink-simple](/img/tasks/demo/flink_task.png)
+
+## 注意事项:
+
+Java 和 Scala 只是用来标识,没有区别,如果是 Python 开发的 Flink 则没有主函数的 class,其余的都一样。
diff --git a/docs/zh-cn/dev/user_doc/guide/task/flink.md b/docs/zh-cn/dev/user_doc/guide/task/flink.md
index 641a37e..b972a0c 100644
--- a/docs/zh-cn/dev/user_doc/guide/task/flink.md
+++ b/docs/zh-cn/dev/user_doc/guide/task/flink.md
@@ -1,23 +1,63 @@
 # Flink节点
 
-- 拖动工具栏中的<img src="/img/flink.png" width="35"/>任务节点到画板中,如下图所示:
-
-<p align="center">
-  <img src="/img/flink_edit.png" width="80%" />
-</p>
-
-
-- 程序类型:支持JAVA、Scala和Python三种语言
-- 主函数的class:是Flink程序的入口Main Class的全路径
-- 主jar包:是Flink的jar包
-- 部署方式:支持cluster、local三种模式
-- slot数量:可以设置slot数
-- taskManage数量:可以设置taskManage数
-- jobManager内存数:可以设置jobManager内存数
-- taskManager内存数:可以设置taskManager内存数
-- 命令行参数:是设置Flink程序的输入参数,支持自定义参数变量的替换。
-- 其他参数:支持 --jars、--files、--archives、--conf格式
-- 资源:如果其他参数中引用了资源文件,需要在资源中选择指定
-- 自定义参数:是Flink局部的用户自定义参数,会替换脚本中以${变量}的内容
-
-注意:JAVA和Scala只是用来标识,没有区别,如果是Python开发的Flink则没有主函数的class,其他都是一样
+## 综述
+
+Flink 任务类型,用于执行 Flink 程序。对于 Flink 节点,worker 会通过使用 flink 命令 `flink run` 的方式提交任务。更多详情查看 [flink cli](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/)。
+
+## 创建任务
+
+- 点击项目管理-项目名称-工作流定义,点击“创建工作流”按钮,进入 DAG 编辑页面;
+- 拖动工具栏的 <img src="/img/tasks/icons/flink.png" width="15"/> 任务节点到画板中。
+
+## 任务参数
+
+- 节点名称:设置任务的名称。一个工作流定义中的节点名称是唯一的。
+- 运行标志:标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。
+- 描述:描述该节点的功能。
+- 任务优先级:worker 线程数不足时,根据优先级从高到低依次执行,优先级一样时根据先进先出原则执行。
+- Worker 分组:任务分配给 worker 组的机器执行,选择 Default,会随机选择一台 worker 机执行。
+- 环境名称:配置运行脚本的环境。
+- 失败重试次数:任务失败重新提交的次数。
+- 失败重试间隔:任务失败重新提交任务的时间间隔,以分为单位。
+- 延迟执行时间:任务延迟执行的时间,以分为单位。
+- 超时告警:勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
+- 程序类型:支持 Java、Scala 和 Python 三种语言。
+- 主函数的 Class:Flink 程序的入口 Main Class 的**全路径**。
+- 主程序包:执行 Flink 程序的 jar 包(通过资源中心上传)。
+- 部署方式:支持 cluster 和 local 两种模式的部署。
+- Flink 版本:根据所需环境选择对应的版本即可。
+- 任务名称(选填):Flink 程序的名称。
+- jobManager 内存数:用于设置 jobManager 内存数,可根据实际生产环境设置对应的内存数。
+- Slot 数量:用于设置 Slot 的数量,可根据实际生产环境设置对应的数量。
+- taskManager 内存数:用于设置 taskManager 内存数,可根据实际生产环境设置对应的内存数。
+- taskManager 数量:用于设置 taskManager 的数量,可根据实际生产环境设置对应的数量。
+- 并行度:用于设置执行 Flink 任务的并行度。
+- 主程序参数:设置 Flink 程序的输入参数,支持自定义参数变量的替换。
+- 选项参数:支持 `--jar`、`--files`、`--archives`、`--conf` 格式。
+- 资源:如果其他参数中引用了资源文件,需要在资源中选择指定。
+- 自定义参数:是 Flink 局部的用户自定义参数,会替换脚本中以 ${变量} 的内容
+- 前置任务:选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
+
+## 任务样例
+
+### 执行 WordCount 程序
+
+本案例为大数据生态中常见的入门案例,常应用于 MapReduce、Flink、Spark 等计算框架。主要为统计输入的文本中,相同的单词的数量有多少。(Flink 的 Releases 附带了此示例作业)
+
+####  上传主程序包
+
+在使用 Flink 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource.md)。
+
+当配置完成资源中心之后,直接使用拖拽的方式,即可上传所需目标文件。
+
+![resource_upload](/img/tasks/demo/upload_flink.png)
+
+#### 配置 Flink 节点
+
+根据上述参数说明,配置所需的内容即可。
+
+![demo-flink-simple](/img/tasks/demo/flink_task.png)
+
+## 注意事项:
+
+Java 和 Scala 只是用来标识,没有区别,如果是 Python 开发的 Flink 则没有主函数的 class,其余的都一样。
diff --git a/img/tasks/demo/flink_task.png b/img/tasks/demo/flink_task.png
new file mode 100644
index 0000000..cda455f
Binary files /dev/null and b/img/tasks/demo/flink_task.png differ
diff --git a/img/tasks/demo/upload_flink.png b/img/tasks/demo/upload_flink.png
new file mode 100644
index 0000000..4c13f17
Binary files /dev/null and b/img/tasks/demo/upload_flink.png differ
diff --git a/img/tasks/icons/flink.png b/img/tasks/icons/flink.png
new file mode 100644
index 0000000..568efbe
Binary files /dev/null and b/img/tasks/icons/flink.png differ