You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/04/29 09:39:24 UTC

[GitHub] [dolphinscheduler] sq-q opened a new pull request, #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

sq-q opened a new pull request, #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851

   <!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
   
   
   ## Purpose of the pull request
   
   Add spark sql instructions 
   
   ## Brief change log
   
   Add information on how to use spark how to execute spark sql program.
   
   ## Verify this pull request
   
   <!--*(Please pick either of the following options)*-->
   
   This pull request is code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   <!--*(example:)*
     - *Added dolphinscheduler-dao tests for end-to-end.*
     - *Added CronUtilsTest to verify the change.*
     - *Manually verified the change by testing locally.* -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] sq-q commented on pull request #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

Posted by GitBox <gi...@apache.org>.

sq-q commented on PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851#issuecomment-1118099166

   > Hi @sq-q, docs LGTM, but please change screenshot from Chinese to English
   
   ok, i'll cut it again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie commented on a diff in pull request #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie commented on code in PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851#discussion_r862454306


##########
docs/docs/zh/guide/task/spark.md:
##########
@@ -2,7 +2,11 @@
 
 ## 综述
 
-Spark  任务类型，用于执行 Spark 程序。对于 Spark 节点，worker 会通过使用 spark 命令 `spark submit` 方式提交任务。更多详情查看 [spark-submit](https://spark.apache.org/docs/3.2.1/submitting-applications.html#launching-applications-with-spark-submit)。
+Spark  任务类型，用于执行 Spark 程序。对于 Spark 节点，worker 会通过使用 spark 命令：

Review Comment:
   ```suggestion
   Spark 任务类型用于执行 Spark 应用。对于 Spark 节点，worker 支持两个不同类型的 spark 命令提交任务：
   ```



##########
docs/docs/zh/guide/task/spark.md:
##########
@@ -40,30 +46,44 @@ Spark  任务类型，用于执行 Spark 程序。对于 Spark 节点，worker 
 
 ## 任务样例
 
-### 执行 WordCount 程序
+### (1) spark submit
+
+#### 执行 WordCount 程序
 
 本案例为大数据生态中常见的入门案例，常应用于 MapReduce、Flink、Spark 等计算框架。主要为统计输入的文本中，相同的单词的数量有多少。
 
-#### 在 DolphinScheduler 中配置 Spark 环境
+##### 在 DolphinScheduler 中配置 Spark 环境
 
 若生产环境中要是使用到 Spark 任务类型，则需要先配置好所需的环境。配置文件如下：`bin/env/dolphinscheduler_env.sh`。
 
 ![spark_configure](/img/tasks/demo/spark_task01.png)
 
-####  上传主程序包
+#####  上传主程序包
 
 在使用 Spark 任务节点时，需要利用资源中心上传执行程序的 jar 包，可参考[资源中心](../resource.md)。
 
 当配置完成资源中心之后，直接使用拖拽的方式，即可上传所需目标文件。
 
 ![resource_upload](/img/tasks/demo/upload_jar.png)
 
-#### 配置 Spark 节点
+##### 配置 Spark 节点
 
 根据上述参数说明，配置所需的内容即可。
 
 ![demo-spark-simple](/img/tasks/demo/spark_task02.png)
 
+### (2) spark sql

Review Comment:
   ```suggestion
   ### spark sql
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -21,11 +25,13 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 - **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
 - **Delayed execution time**: The time (unit minute) that a task delays in execution.
 - **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
-- **Program type**: Supports Java, Scala and Python.
+- **Program type**: Supports Java, Scala, Python and Sql.

Review Comment:
   ```suggestion
   - **Program type**: Supports Java, Scala, Python and SQL.
   ```



##########
docs/docs/zh/guide/task/spark.md:
##########
@@ -40,30 +46,44 @@ Spark  任务类型，用于执行 Spark 程序。对于 Spark 节点，worker 
 
 ## 任务样例
 
-### 执行 WordCount 程序
+### (1) spark submit

Review Comment:
   ```suggestion
   ### spark submit
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -39,30 +45,42 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 
 ## Task Example
 
-### Execute the WordCount Program
+### (1) spark submit
+
+#### Execute the WordCount Program
 
 This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
 
-#### Configure the Spark Environment in DolphinScheduler
+##### Configure the Spark Environment in DolphinScheduler
 
 If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
 
 ![spark_configure](/img/tasks/demo/spark_task01.png)
 
-#### Upload the Main Package
+##### Upload the Main Package
 
 When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
 
 After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
 
 ![resource_upload](/img/tasks/demo/upload_jar.png)
 
-#### Configure Spark Nodes
+##### Configure Spark Nodes
 
 Configure the required content according to the parameter descriptions above.
 
 ![demo-spark-simple](/img/tasks/demo/spark_task02.png)
 
+### (2) spark sql
+
+#### Execute DDL and DML statements
+
+This case is to create a view table terms and write three rows of data and a table wc in parquet format and determine whether the table exists. The program type is SQL. Insert the data of the view table terms into the table wc in parquet format.
+
+![spark_sql](/img/tasks/demo/spark_sql.png)
+
 ## Notice
 
-JAVA and Scala only used for identification, there is no difference. If you use Python to develop Spark application, there is no class of the main function and the rest is the same.
+JAVA and Scala are only used for identification, and there is no difference. If it is Spark developed by Python, there is no class for the main function. Others are the same. JAVA, Scala and Python do not have SQL scripts.

Review Comment:
   ```suggestion
   JAVA and Scala are only used for identification, and there is no difference when you use the Spark task. If your application is developed by Python, you could just ignore the parameter **Main Class** in the form. Parameter **SQL scripts** is only for SQL type and could be ignored in JAVA, Scala and Python.
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -39,30 +45,42 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 
 ## Task Example
 
-### Execute the WordCount Program
+### (1) spark submit

Review Comment:
   ```suggestion
   ### spark submit
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -2,7 +2,11 @@
 
 ## Overview
 
-Spark task type used to execute Spark program. For Spark nodes, the worker submits the task by using the spark command `spark submit`. See [spark-submit](https://spark.apache.org/docs/3.2.1/submitting-applications.html#launching-applications-with-spark-submit) for more details.
+Spark task type for executing Spark programs. For Spark nodes, the worker will do this by using the spark command:

Review Comment:
   ```suggestion
   Spark task type for executing Spark application. When executing the Spark task, the worker will submits a job to the Spark cluster by following commands:
   ```



##########
docs/docs/zh/guide/task/spark.md:
##########
@@ -22,11 +26,13 @@ Spark  任务类型，用于执行 Spark 程序。对于 Spark 节点，worker 
 - 失败重试间隔：任务失败重新提交任务的时间间隔，以分为单位。
 - 延迟执行时间：任务延迟执行的时间，以分为单位。
 - 超时警告：勾选超时警告、超时失败，当任务超过“超时时长”后，会发送告警邮件并且任务执行失败。
-- 程序类型：支持 Java、Scala 和 Python 三种语言。
+- 程序类型：支持 Java、Scala、Python 和 Sql 四种语言。

Review Comment:
   ```suggestion
   - 程序类型：支持 Java、Scala、Python 和 SQL 四种语言。
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -39,30 +45,42 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 
 ## Task Example
 
-### Execute the WordCount Program
+### (1) spark submit
+
+#### Execute the WordCount Program
 
 This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
 
-#### Configure the Spark Environment in DolphinScheduler
+##### Configure the Spark Environment in DolphinScheduler
 
 If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
 
 ![spark_configure](/img/tasks/demo/spark_task01.png)
 
-#### Upload the Main Package
+##### Upload the Main Package
 
 When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
 
 After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
 
 ![resource_upload](/img/tasks/demo/upload_jar.png)
 
-#### Configure Spark Nodes
+##### Configure Spark Nodes
 
 Configure the required content according to the parameter descriptions above.
 
 ![demo-spark-simple](/img/tasks/demo/spark_task02.png)
 
+### (2) spark sql
+
+#### Execute DDL and DML statements
+
+This case is to create a view table terms and write three rows of data and a table wc in parquet format and determine whether the table exists. The program type is SQL. Insert the data of the view table terms into the table wc in parquet format.
+
+![spark_sql](/img/tasks/demo/spark_sql.png)
+
 ## Notice
 
-JAVA and Scala only used for identification, there is no difference. If you use Python to develop Spark application, there is no class of the main function and the rest is the same.
+JAVA and Scala are only used for identification, and there is no difference. If it is Spark developed by Python, there is no class for the main function. Others are the same. JAVA, Scala and Python do not have SQL scripts.

Review Comment:
   ```suggestion
   JAVA and Scala are only used for identification, and there is no difference when you use the Spark task. If your application is developed by Python, you could just ignore the parameter **Main Class** in the form. Parameter **SQL scripts** is only for SQL type and could be ignored in JAVA, Scala and Python.
   ```



##########
docs/docs/en/guide/task/spark.md:
##########
@@ -39,30 +45,42 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 
 ## Task Example
 
-### Execute the WordCount Program
+### (1) spark submit
+
+#### Execute the WordCount Program
 
 This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
 
-#### Configure the Spark Environment in DolphinScheduler
+##### Configure the Spark Environment in DolphinScheduler
 
 If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
 
 ![spark_configure](/img/tasks/demo/spark_task01.png)
 
-#### Upload the Main Package
+##### Upload the Main Package
 
 When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
 
 After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
 
 ![resource_upload](/img/tasks/demo/upload_jar.png)
 
-#### Configure Spark Nodes
+##### Configure Spark Nodes
 
 Configure the required content according to the parameter descriptions above.
 
 ![demo-spark-simple](/img/tasks/demo/spark_task02.png)
 
+### (2) spark sql

Review Comment:
   ```suggestion
   ### spark sql
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] sq-q commented on a diff in pull request #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

Posted by GitBox <gi...@apache.org>.

sq-q commented on code in PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851#discussion_r862457003


##########
docs/docs/en/guide/task/spark.md:
##########
@@ -39,30 +45,42 @@ Spark task type used to execute Spark program. For Spark nodes, the worker submi
 
 ## Task Example
 
-### Execute the WordCount Program
+### (1) spark submit
+
+#### Execute the WordCount Program
 
 This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
 
-#### Configure the Spark Environment in DolphinScheduler
+##### Configure the Spark Environment in DolphinScheduler
 
 If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
 
 ![spark_configure](/img/tasks/demo/spark_task01.png)
 
-#### Upload the Main Package
+##### Upload the Main Package
 
 When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
 
 After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
 
 ![resource_upload](/img/tasks/demo/upload_jar.png)
 
-#### Configure Spark Nodes
+##### Configure Spark Nodes
 
 Configure the required content according to the parameter descriptions above.
 
 ![demo-spark-simple](/img/tasks/demo/spark_task02.png)
 
+### (2) spark sql
+
+#### Execute DDL and DML statements
+
+This case is to create a view table terms and write three rows of data and a table wc in parquet format and determine whether the table exists. The program type is SQL. Insert the data of the view table terms into the table wc in parquet format.
+
+![spark_sql](/img/tasks/demo/spark_sql.png)
+
 ## Notice
 
-JAVA and Scala only used for identification, there is no difference. If you use Python to develop Spark application, there is no class of the main function and the rest is the same.
+JAVA and Scala are only used for identification, and there is no difference. If it is Spark developed by Python, there is no class for the main function. Others are the same. JAVA, Scala and Python do not have SQL scripts.

Review Comment:
   OK, I'll revise it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] sq-q commented on pull request #9851: [Improve][docs] Add spark sql docs to task spark

Posted by GitBox <gi...@apache.org>.

sq-q commented on PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851#issuecomment-1118114312

   > LGTM, thanks @sq-q
   
   Grazie


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie merged pull request #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie merged PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] SbloodyS commented on pull request #9851: [Improvement-9772][docs/docs-zh] add spark sql docs for Docs

Posted by GitBox <gi...@apache.org>.

SbloodyS commented on PR #9851:
URL: https://github.com/apache/dolphinscheduler/pull/9851#issuecomment-1113119084

   Related pr: #9790 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org