You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by li...@apache.org on 2022/11/12 16:09:11 UTC
[dolphinscheduler] branch dev updated: [Feature-8030][docs] Add sqoop task doc (#12855)
This is an automated email from the ASF dual-hosted git repository.
lidongdai pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler.git
The following commit(s) were added to refs/heads/dev by this push:
new 0373e06615 [Feature-8030][docs] Add sqoop task doc (#12855)
0373e06615 is described below
commit 0373e0661586c48523cedaab8a80fc7298f50a4d
Author: baihongbin <48...@users.noreply.github.com>
AuthorDate: Sun Nov 13 00:09:05 2022 +0800
[Feature-8030][docs] Add sqoop task doc (#12855)
* [Feature-8030][docs] Add sqoop task doc
* Update docs/docs/zh/guide/task/sqoop.md
Co-authored-by: Eric Gao <er...@gmail.com>
* Update docs/docs/en/guide/task/sqoop.md
Co-authored-by: Eric Gao <er...@gmail.com>
* [Feature-8030][docs] Add sqoop task doc
Co-authored-by: Eric Gao <er...@gmail.com>
---
docs/docs/en/guide/task/sqoop.md | 90 ++++++++++++++++++++++++++++++++++
docs/docs/zh/guide/task/sqoop.md | 91 +++++++++++++++++++++++++++++++++++
docs/img/tasks/demo/sqoop_task01.png | Bin 0 -> 2530 bytes
docs/img/tasks/demo/sqoop_task02.png | Bin 0 -> 216945 bytes
docs/img/tasks/demo/sqoop_task03.png | Bin 0 -> 2455 bytes
docs/img/tasks/icons/sqoop.png | Bin 0 -> 815 bytes
6 files changed, 181 insertions(+)
diff --git a/docs/docs/en/guide/task/sqoop.md b/docs/docs/en/guide/task/sqoop.md
new file mode 100644
index 0000000000..e2d65debbd
--- /dev/null
+++ b/docs/docs/en/guide/task/sqoop.md
@@ -0,0 +1,90 @@
+# Sqoop Node
+
+## Overview
+
+Sqoop task type for executing Sqoop application. The workers run `sqoop` to execute sqoop tasks.
+
+## Create Task
+
+- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
+- Drag from the toolbar <img src="../../../../img/tasks/icons/sqoop.png" width="15"/> to the canvas.
+
+## Task Parameters
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md#default-task-parameters) `Default Task Parameters` section for default parameters.)
+
+- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
+
+| **Parameter** | **Description** |
+|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Job Name | map-reduce job name |
+| Direct | (1) import:Imports an individual table from an RDBMS to HDFS or Hve. (2) export:Exports a set of files from HDFS or Hive back to an RDBMS. |
+| Hadoop Params | Hadoop custom param for sqoop job. |
+| Sqoop Advanced Parameters | Sqoop advanced param for sqoop job. |
+| Data Source - Type | Select the corresponding data source type. |
+| Data Source - Datasource | Select the corresponding DataSource. |
+| Data Source - ModelType | (1) Form:Synchronize data from a table, need to fill in the `Table` and `ColumnType`. (2) SQL:Synchronize data of SQL queries result, need to fill in the `SQL Statement`. |
+| Data Source - Table | Sets the table name to use when importing to Hive. |
+| Data Source - ColumnType | (1) All Columns:Import all fields in the selected table. (2) Some Columns:Import the specified fields in the selected table, need to fill in the `Column`. |
+| Data Source - Column | Fill in the field name, and separate with commas. |
+| Data Source - SQL Statement | Fill in SQL query statement. |
+| Data Source - Map Column Hive | Override mapping from SQL to Hive type for configured columns. |
+| Data Source - Map Column Java | Override mapping from SQL to Java type for configured columns. |
+| Data Target - Type | Select the corresponding data target type. |
+| Data Target - Database | Fill in the Hive database name. |
+| Data Target - Table | Fill in the Hive table name. |
+| Data Target - CreateHiveTable | Import a table definition into Hive. If set, then the job will fail if the target hive table exits. |
+| Data Target - DropDelimiter | Drops `\n`, `\r`, and `\01` from string fields when importing to Hive. |
+| Data Target - OverWriteSrc | Overwrite existing data in the Hive table. |
+| Data Target - Hive Target Dir | You can also explicitly choose the target directory. |
+| Data Target - ReplaceDelimiter | Replace `\n`, `\r`, and `\01` from string fields with user defined string when importing to Hive. |
+| Data Target - Hive partition Keys | Fill in the hive partition keys name, and separate with commas. |
+| Data Target - Hive partition Values | Fill in the hive partition Values name, and separate with commas. |
+| Data Target - Target Dir | Fill in the HDFS target directory. |
+| Data Target - DeleteTargetDir | Delete the target directory if it exists. |
+| Data Target - CompressionCodec | Choice the hadoop codec. |
+| Data Target - FileType | Choice the storage Type. |
+| Data Target - FieldsTerminated | Sets the field separator character. |
+| Data Target - LinesTerminated | Sets the end-of-line character. |
+
+## Task Example
+
+This example demonstrates importing data from MySQL into Hive. The MySQL database name is `test` and the table name is `example`. The following figure shows sample data.
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+### Configuring the Sqoop environment
+
+If you are using the Sqoop task type in a production environment, you must ensure that the worker can execute the `sqoop` command.
+
+### Configuring Sqoop Task Node
+
+you can configure the node content by following the steps in the diagram below.
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+The key configuration in this sample is shown in the following table.
+
+| **Parameter** | **Value** |
+|-------------------------------------|----------------------------------------------------------------------|
+| Job Name | sqoop_mysql_to_hive_test |
+| Data Source - Type | MYSQL |
+| Data Source - Datasource | MYSQL MyTestMySQL(You could change MyTestMySQL to the name you like) |
+| Data Source - ModelType | Form |
+| Data Source - Table | example |
+| Data Source - ColumnType | All Columns |
+| Data Target - Type | HIVE |
+| Data Target - Database | tmp |
+| Data Target - Table | example |
+| Data Target - CreateHiveTable | true |
+| Data Target - DropDelimiter | false |
+| Data Target - OverWriteSrc | true |
+| Data Target - Hive Target Dir | (No need to fill in) |
+| Data Target - ReplaceDelimiter | , |
+| Data Target - Hive partition Keys | (No need to fill in) |
+| Data Target - Hive partition Values | (No need to fill in) |
+
+### View run results
+
+![sqoop_task03](../../../../img/tasks/demo/sqoop_task03.png)
diff --git a/docs/docs/zh/guide/task/sqoop.md b/docs/docs/zh/guide/task/sqoop.md
new file mode 100644
index 0000000000..cfea038a31
--- /dev/null
+++ b/docs/docs/zh/guide/task/sqoop.md
@@ -0,0 +1,91 @@
+# SQOOP 节点
+
+## 综述
+
+SQOOP 任务类型,用于执行 SQOOP 程序。对于 SQOOP 节点,worker 会通过执行 `sqoop` 命令来执行 SQOOP 任务。
+
+## 创建任务
+
+- 点击项目管理 -> 项目名称 -> 工作流定义,点击“创建工作流”按钮,进入 DAG 编辑页面;
+- 拖动工具栏的 <img src="../../../../img/tasks/icons/sqoop.png" width="15"/> 任务节点到画板中。
+
+## 任务参数
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md#默认任务参数)`默认任务参数`一栏。)
+
+- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md)`默认任务参数`一栏。
+
+| **任务参数** | **描述** |
+|------------------|----------------------------------------------------------------------|
+| 任务名称 | map-reduce 任务名称 |
+| 流向 | (1) import:从 RDBMS 导入 HDFS 或Hive (2) export:从 HDFS 或 Hive 导出到 RDBMS |
+| Hadoop 参数 | 添加自定义 Hadoop 参数 |
+| Sqoop 参数 | 添加自定义 Sqoop 参数 |
+| 数据来源 - 类型 | 选择数据源类型 |
+| 数据来源 - 数据源 | 选择数据源 |
+| 数据来源 - 模式 | (1) 单表:同步单张表的数据,需填写`表名`和`列类型` (2) SQL:同步 SQL 查询的结果,需填写`SQL语句` |
+| 数据来源 - 表名 | 设置需要导入 hive 的表名 |
+| 数据来源 - 列类型 | (1) 全表导入:导入表中的所有字段 (2) 选择列:导入表中的指定列,需填写`列`信息 |
+| 数据来源 - 列 | 填写字段名称,多个字段之间使用英文逗号分割 |
+| 数据来源 - SQL 语句 | 填写 SQL 查询语句 |
+| 数据来源 - Hive 类型映射 | 自定义 SQL 与 Hive 类型映射 |
+| 数据来源 - Java 类型映射 | 自定义 SQL 与 Java 类型映射 |
+| 数据目的 - 类型 | 选择数据目的类型 |
+| 数据目的 - 数据库 | 填写 Hive 数据库名称 |
+| 数据目的 - 表名 | 填写 Hive 表名 |
+| 数据目的 - 是否创建新表 | 选择是否自动根据导入数据类型创建数据目的表,如果目标表已经存在了,那么创建任务会失败 |
+| 数据目的 - 是否删除分隔符 | 自动删除字符串中的`\n`、`\r`和`\01`字符 |
+| 数据目的 - 是否覆盖数据源 | 覆盖 Hive 表中的现有数据 |
+| 数据目的 - Hive 目标路径 | 自定义 Hive 目标路径 |
+| 数据目的 - 替换分隔符 | 替换字符串中的`\n`、`\r`和`\01`字符 |
+| 数据目的 - Hive 分区键 | 填写 Hive 分区键,多个分区键之间使用英文逗号分割 |
+| 数据目的 - Hive 分区值 | 填写 Hive 分区值,多个分区值之间使用英文逗号分割 |
+| 数据目的 - 目标路径 | 填写 HDFS 的目标路径 |
+| 数据目的 - 是否删除目录 | 如果目录已经存在,则删除目录 |
+| 数据目的 - 压缩类型 | 选择 HDFS 文件压缩类型 |
+| 数据目的 - 保存格式 | 选择文件保存格式 |
+| 数据目的 - 列分隔符 | 自定义列分隔符 |
+| 数据目的 - 行分隔符 | 自定义行分隔符 |
+
+## 任务样例
+
+该样例演示为从 MySQL 数据导入到 Hive 中。 其中 MySQL 数据库名称为:`test`,表名称为`example`。下图为样例数据。
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+### 配置 Sqoop 环境
+
+若生产环境中要是使用到 Sqoop 任务类型,则需要先配置好所需的环境。确保任务节点可以执行`sqoop`命令。
+
+### 配置 Sqoop 任务节点
+
+可按照下图步骤进行配置节点内容。
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+本样例中的关键配置如下表。
+
+| **任务参数** | **参数值** |
+|------------------|------------------------------------------------|
+| 任务名称 | sqoop_mysql_to_hive_test |
+| 流向 | import |
+| 数据来源 - 类型 | MYSQL |
+| 数据来源 - 数据源 | MYSQL MyTestMySQL(您可以将MyTestMySQL改成您自己取的数据源名称) |
+| 数据来源 - 模式 | 表单 |
+| 数据来源 - 表名 | example |
+| 数据来源 - 列类型 | 全表导入 |
+| 数据目的 - 类型 | HIVE |
+| 数据目的 - 数据库 | tmp |
+| 数据目的 - 表名 | example |
+| 数据目的 - 是否创建新表 | true |
+| 数据目的 - 是否删除分隔符 | false |
+| 数据目的 - 是否覆盖数据源 | true |
+| 数据目的 - Hive 目标路径 | (无需填写) |
+| 数据目的 - 替换分隔符 | , |
+| 数据目的 - Hive 分区键 | (无需填写) |
+| 数据目的 - Hive 分区值 | (无需填写) |
+
+### 查看运行结果
+
+![sqoop_task03](../../../../img/tasks/demo/sqoop_task03.png)
diff --git a/docs/img/tasks/demo/sqoop_task01.png b/docs/img/tasks/demo/sqoop_task01.png
new file mode 100644
index 0000000000..ec63a52337
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task01.png differ
diff --git a/docs/img/tasks/demo/sqoop_task02.png b/docs/img/tasks/demo/sqoop_task02.png
new file mode 100644
index 0000000000..18215eb98b
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task02.png differ
diff --git a/docs/img/tasks/demo/sqoop_task03.png b/docs/img/tasks/demo/sqoop_task03.png
new file mode 100644
index 0000000000..1a197ea79a
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task03.png differ
diff --git a/docs/img/tasks/icons/sqoop.png b/docs/img/tasks/icons/sqoop.png
new file mode 100644
index 0000000000..6ff06de10e
Binary files /dev/null and b/docs/img/tasks/icons/sqoop.png differ