You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/11/10 07:23:52 UTC
[GitHub] [dolphinscheduler] EricGao888 commented on a diff in pull request #12855: [Feature-8030][docs] Add sqoop task doc

EricGao888 commented on code in PR #12855:
URL: https://github.com/apache/dolphinscheduler/pull/12855#discussion_r1018727009


##########
docs/docs/en/guide/task/sqoop.md:
##########
@@ -0,0 +1,91 @@
+# Sqoop Node
+
+## Overview
+
+Sqoop task type for executing Sqoop application. The workers run `sqoop` to execute  sqoop tasks.

Review Comment:
   ```suggestion
   Sqoop task type for executing Sqoop application. The workers run `sqoop` to execute sqoop tasks.
   ```



##########
docs/docs/zh/guide/task/sqoop.md:
##########
@@ -0,0 +1,94 @@
+# SQOOP 节点
+
+## 综述
+
+SQOOP 任务类型，用于执行 SQOOP 程序。对于 SQOOP 节点，worker 会通过执行 `sqoop` 命令来执行 SQOOP 任务。
+
+## 创建任务
+
+- 点击项目管理 -> 项目名称 -> 工作流定义，点击“创建工作流”按钮，进入 DAG 编辑页面；
+- 拖动工具栏的 <img src="../../../../img/tasks/icons/sqoop.png" width="15"/> 任务节点到画板中。
+
+## 任务参数
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- 默认参数说明请参考[DolphinScheduler任务参数附录]&#40;appendix.md#默认任务参数&#41;`默认任务参数`一栏。)
+
+- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md)`默认任务参数`一栏。
+
+| **任务参数**         | **描述**                                                               |
+|------------------|----------------------------------------------------------------------|
+| 任务名称             | map-reduce 任务名称                                                      |
+| 流向               | (1) import：从 RDBMS 导入 HDFS 或Hive  (2) export：从 HDFS 或 Hive 导出到 RDBMS |
+| Hadoop 参数        | 添加自定义 Hadoop 参数                                                      |
+| Sqoop 参数         | 添加自定义 Sqoop 参数                                                       |
+| 数据来源 - 类型        | 选择数据源类型                                                              |
+| 数据来源 - 数据源       | 选择数据源                                                                |
+| 数据来源 - 模式        | (1) 单表：同步单张表的数据，需填写`表名`和`列类型`  (2) SQL：同步 SQL 查询的结果，需填写`SQL语句`       |
+| 数据来源 - 表名        | 设置需要导入 hive 的表名                                                      |
+| 数据来源 - 列类型       | (1) 全表导入：导入表中的所有字段  (2) 选择列：导入表中的指定列，需填写`列`信息                        |
+| 数据来源 - 列         | 填写字段名称，多个字段之间使用英文逗号分割                                                |
+| 数据来源 - SQL 语句    | 填写 SQL 查询语句                                                          |
+| 数据来源 - Hive 类型映射 | 自定义 SQL 与 Hive 类型映射                                                  |
+| 数据来源 - Java 类型映射 | 自定义 SQL 与 Java 类型映射                                                  |
+| 数据目的 - 类型        | 选择数据目的类型                                                             |
+| 数据目的 - 数据库       | 填写 Hive 数据库名称                                                        |
+| 数据目的 - 表名        | 填写 Hive 表名                                                           |
+| 数据目的 - 是否创建新表    | 选择是否自动根据导入数据类型创建数据目的表，如果目标表已经存在了，那么创建任务会失败                           |
+| 数据目的 - 是否删除分隔符   | 自动删除字符串中的`\n`、`\r`和`\01`字符                                           |
+| 数据目的 - 是否覆盖数据源   | 覆盖 Hive 表中的现有数据                                                      |
+| 数据目的 - Hive 目标路径 | 自定义 Hive 目标路径                                                        |
+| 数据目的 - 替换分隔符     | 替换字符串中的`\n`、`\r`和`\01`字符                                             |
+| 数据目的 - Hive 分区键  | 填写 Hive 分区键，多个分区键之间使用英文逗号分割                                          |
+| 数据目的 - Hive 分区值  | 填写 Hive 分区值，多个分区值之间使用英文逗号分割                                          |
+| 数据目的 - 目标路径      | 填写 HDFS 的目标路径                                                        |
+| 数据目的 - 是否删除目录    | 如果目录已经存在，则删除目录                                                       |
+| 数据目的 - 压缩类型      | 选择 HDFS 文件压缩类型                                                       |
+| 数据目的 - 保存格式      | 选择文件保存格式                                                             |
+| 数据目的 - 列分隔符      | 自定义列分隔符                                                              |
+| 数据目的 - 行分隔符      | 自定义行分隔符                                                              |
+
+## 任务样例
+
+该样例演示为从 MySQL 数据导入到 Hive 中。 其中 MySQL 数据库名称为：`test`，表名称为`example`。下图为样例数据。
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+
+
+
+### 配置 Sqoop 环境
+
+若生产环境中要是使用到 Sqoop 任务类型，则需要先配置好所需的环境。确保任务节点可以执行`sqoop`命令。
+
+### 配置 Sqoop 任务节点
+
+可按照下图步骤进行配置节点内容。
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+本样例中的关键配置如下表。
+
+| **任务参数**         | **参数值**                                  |
+|------------------|------------------------------------------|
+| 任务名称             | sqoop_mysql_to_hive_test                 |
+| 流向               | import                                   |
+| 数据来源 - 类型        | MYSQL                                    |
+| 数据来源 - 数据源       | MYSQL MyTestMySQL（MyTestMySQL是我的测试数据源名称） |

Review Comment:
   ```suggestion
   | 数据来源 - 数据源       | MYSQL MyTestMySQL（您可以将MyTestMySQL改成您自己取的数据源名称） |
   ```



##########
docs/docs/en/guide/task/sqoop.md:
##########
@@ -0,0 +1,91 @@
+# Sqoop Node
+
+## Overview
+
+Sqoop task type for executing Sqoop application. The workers run `sqoop` to execute  sqoop tasks.
+
+## Create Task
+
+- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
+- Drag from the toolbar <img src="../../../../img/tasks/icons/sqoop.png" width="15"/> to the canvas.
+
+## Task Parameters
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix]&#40;appendix.md#default-task-parameters&#41; `Default Task Parameters` section for default parameters.)
+
+- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
+
+| **Parameter**                       | **Description**                                                                                                                                                            |
+|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Job Name                            | map-reduce job name                                                                                                                                                        |
+| Direct                              | (1) import:Imports an individual table from an RDBMS to HDFS or Hve.  (2) export:Exports a set of files from HDFS or Hive back to an RDBMS.                                |
+| Hadoop Params                       | Hadoop custom param for sqoop job.                                                                                                                                         |
+| Sqoop Advanced Parameters           | Sqoop advanced param for sqoop job.                                                                                                                                        |
+| Data Source - Type                  | Select the corresponding data source type.                                                                                                                                 |
+| Data Source - Datasource            | Select the corresponding DataSource.                                                                                                                                       |
+| Data Source - ModelType             | (1) Form:Synchronize data from a table, need to fill in the `Table` and `ColumnType`. (2) SQL:Synchronize data of SQL queries result, need to fill in the `SQL Statement`. |
+| Data Source - Table                 | Sets the table name to use when importing to Hive.                                                                                                                         |
+| Data Source - ColumnType            | (1) All Columns:Import all fields in the selected table.  (2) Some Columns:Import the specified fields in the selected table, need to fill in the `Column`.                |
+| Data Source - Column                | Fill in the field name, and separate with commas.                                                                                                                          |
+| Data Source - SQL Statement         | Fill in SQL query statement.                                                                                                                                               |
+| Data Source - Map Column Hive       | Override mapping from SQL to Hive type for configured columns.                                                                                                             |
+| Data Source - Map Column Java       | Override mapping from SQL to Java type for configured columns.                                                                                                             |
+| Data Target - Type                  | Select the corresponding data target type.                                                                                                                                 |
+| Data Target - Database              | Fill in the Hive database name.                                                                                                                                            |
+| Data Target - Table                 | Fill in the Hive table name.                                                                                                                                               |
+| Data Target - CreateHiveTable       | Import a table definition into Hive. If set, then the job will fail if the target hive table exits.                                                                        |
+| Data Target - DropDelimiter         | Drops `\n`, `\r`, and `\01` from string fields when importing to Hive.                                                                                                     |
+| Data Target - OverWriteSrc          | Overwrite existing data in the Hive table.                                                                                                                                 |
+| Data Target - Hive Target Dir       | You can also explicitly choose the target directory.                                                                                                                       |
+| Data Target - ReplaceDelimiter      | Replace `\n`, `\r`, and `\01` from string fields with user defined string when importing to Hive.                                                                          |
+| Data Target - Hive partition Keys   | Fill in the hive partition keys name, and separate with commas.                                                                                                            |
+| Data Target - Hive partition Values | Fill in the hive partition Values name, and separate with commas.                                                                                                          |
+| Data Target - Target Dir            | Fill in the HDFS target directory.                                                                                                                                         |
+| Data Target - DeleteTargetDir       | Delete the target directory if it exists.                                                                                                                                  |
+| Data Target - CompressionCodec      | Choice the hadoop codec.                                                                                                                                                   |
+| Data Target - FileType              | Choice the storage Type.                                                                                                                                                   |
+| Data Target - FieldsTerminated      | Sets the field separator character.                                                                                                                                        |
+| Data Target - LinesTerminated       | Sets the end-of-line character.                                                                                                                                            |
+
+
+## Task Example
+
+This example demonstrates importing data from MySQL into Hive. The MySQL database name is `test` and the table name is `example`. The following figure shows sample data.
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+### Configuring the Sqoop environment
+
+If you are using the Sqoop task type in a production environment, you must ensure that the worker can execute the `sqoop` command.
+
+### Configuring Sqoop Task Node
+
+you can configure the node content by following the steps in the diagram below.
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+The key configuration in this sample is shown in the following table.
+
+| **Parameter**                       | **Value**                                                         |
+|-------------------------------------|-------------------------------------------------------------------|
+| Job Name                            | sqoop_mysql_to_hive_test                                          |
+| Data Source - Type                  | MYSQL                                                             |
+| Data Source - Datasource            | MYSQL MyTestMySQL(MyTestMySQL is the name of my test data source) |

Review Comment:
   ```suggestion
   | Data Source - Datasource            | MYSQL MyTestMySQL(You could change MyTestMySQL to the name you like) |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org