You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/31 13:04:15 UTC

[GitHub] [incubator-doris] xy720 opened a new pull request #4489: [Spark load][Document] Add docs about spark and yarn client for spark load

xy720 opened a new pull request #4489:
URL: https://github.com/apache/incubator-doris/pull/4489


   ## Proposed changes
   
   Add docs about spark and yarn client for spark load
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [x] Documentation Update (if none of the other choices apply)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #4489: [Spark load][Document] Add docs about spark and yarn client for spark load

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #4489:
URL: https://github.com/apache/incubator-doris/pull/4489#discussion_r480224432



##########
File path: docs/zh-CN/administrator-guide/load-data/spark-load-manual.md
##########
@@ -229,7 +230,43 @@ GRANT USAGE_PRIV ON RESOURCE * TO ROLE "role0";
 REVOKE USAGE_PRIV ON RESOURCE "spark0" FROM "user0"@"%";
 ```
 
+### 配置 SPARK 客户端
+
+FE底层通过执行spark-submit的命令去提交spark任务,因此需要为FE配置spark客户端,建议使用2.4以上的spark2官方版本,[spark下载地址](https://archive.apache.org/dist/spark/),下载完成后,请按步骤完成以下配置。

Review comment:
       ok,done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4489: [Spark load][Document] Add docs about spark and yarn client for spark load

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4489:
URL: https://github.com/apache/incubator-doris/pull/4489


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4489: [Spark load][Document] Add docs about spark and yarn client for spark load

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4489:
URL: https://github.com/apache/incubator-doris/pull/4489#discussion_r480200690



##########
File path: docs/zh-CN/administrator-guide/load-data/spark-load-manual.md
##########
@@ -229,7 +230,43 @@ GRANT USAGE_PRIV ON RESOURCE * TO ROLE "role0";
 REVOKE USAGE_PRIV ON RESOURCE "spark0" FROM "user0"@"%";
 ```
 
+### 配置 SPARK 客户端
+
+FE底层通过执行spark-submit的命令去提交spark任务,因此需要为FE配置spark客户端,建议使用2.4以上的spark2官方版本,[spark下载地址](https://archive.apache.org/dist/spark/),下载完成后,请按步骤完成以下配置。
+
+#### 配置 SPARK_HOME 环境变量
+
+将spark客户端放在FE同一台机器上的目录下,并在FE的配置文件配置`spark_home_default_dir`项指向此目录,此配置项默认为FE根目录下的 `lib/spark2x`路径,此项不可为空。
+
+#### 配置 SPARK 依赖包
+
+将spark客户端下的jars文件夹内所有jar包归档打包成一个zip文件,并在FE的配置文件配置`spark_resource_path`项指向此zip文件,若此配置项为空,则FE会尝试寻找FE根目录下的`lib/spark2x/jars/spark-2x.zip`文件,若没有找到则会报文件不存在的错误。
+
+当提交spark load任务时,会将归档好的依赖文件上传至远端仓库,默认仓库路径挂在`working_dir/{cluster_id}`目录下,并以`__spark_repository__{resource_name}`命名,表示集群内的一个resource对应一个远端仓库,远端仓库目录结构参考如下:
+
+```
+__spark_repository__spark0/
+    |-__archive_1.0.0/
+    |        |-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp-1.0.0-jar-with-dependencies.jar
+    |        |-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip
+    |-__archive_1.1.0/
+    |        |-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp-1.1.0-jar-with-dependencies.jar
+    |        |-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip
+    |-__archive_1.2.0/
+    |        |-...
+```
+
+除了spark依赖(默认以`spark-2x.zip`命名),FE还会上传DPP的依赖包至远端仓库,若此次spark load提交的所有依赖文件都已存在远端仓库,那么就不需要在上传依赖,省下原来每次重复上传大量文件的时间。
 
+### 配置 YARN 客户端
+
+FE底层通过执行yarn命令去获取正在运行的application的状态以及杀死application,因此需要为FE配置yarn客户端,建议使用2.5以上的hadoop官方版本,[hadoop下载地址](https://archive.apache.org/dist/hadoop/common/),下载完成后,请按步骤完成以下配置。

Review comment:
       可以给一个具体版本号

##########
File path: docs/zh-CN/administrator-guide/load-data/spark-load-manual.md
##########
@@ -229,7 +230,43 @@ GRANT USAGE_PRIV ON RESOURCE * TO ROLE "role0";
 REVOKE USAGE_PRIV ON RESOURCE "spark0" FROM "user0"@"%";
 ```
 
+### 配置 SPARK 客户端
+
+FE底层通过执行spark-submit的命令去提交spark任务,因此需要为FE配置spark客户端,建议使用2.4以上的spark2官方版本,[spark下载地址](https://archive.apache.org/dist/spark/),下载完成后,请按步骤完成以下配置。

Review comment:
       可以给一个具体的版本号,比如2.4.3




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org