You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by zy...@apache.org on 2023/06/09 08:32:24 UTC

[doris] branch branch-1.2-lts updated: [typo](doc) update spark connector doc for branch 1.2 (#20640)

This is an automated email from the ASF dual-hosted git repository.

zykkk pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-1.2-lts by this push:
     new 485554f5f7 [typo](doc) update spark connector doc for branch 1.2 (#20640)
485554f5f7 is described below

commit 485554f5f7f7476bea6aca549b5baa6aca81e50d
Author: gnehil <ad...@gmail.com>
AuthorDate: Fri Jun 9 16:32:15 2023 +0800

    [typo](doc) update spark connector doc for branch 1.2 (#20640)
---
 docs/en/docs/ecosystem/spark-doris-connector.md    |  98 ++++++--------------
 docs/zh-CN/docs/ecosystem/spark-doris-connector.md | 103 ++++++---------------
 2 files changed, 54 insertions(+), 147 deletions(-)

diff --git a/docs/en/docs/ecosystem/spark-doris-connector.md b/docs/en/docs/ecosystem/spark-doris-connector.md
index 56d403f2e7..b39d8ff678 100644
--- a/docs/en/docs/ecosystem/spark-doris-connector.md
+++ b/docs/en/docs/ecosystem/spark-doris-connector.md
@@ -1,7 +1,7 @@
 ---
 {
-    "title": "Spark Doris Connector",
-    "language": "en"
+  "title": "Spark Doris Connector",
+  "language": "en"
 }
 ---
 
@@ -28,7 +28,7 @@ under the License.
 
 Spark Doris Connector can support reading data stored in Doris and writing data to Doris through Spark.
 
-Github: https://github.com/apache/incubator-doris-spark-connector
+Github: https://github.com/apache/doris-spark-connector
 
 - Support reading data from `Doris`.
 - Support `Spark DataFrame` batch/stream writing data to `Doris`
@@ -37,94 +37,48 @@ Github: https://github.com/apache/incubator-doris-spark-connector
 
 ## Version Compatibility
 
-| Connector     | Spark | Doris  | Java | Scala |
-|---------------| ----- | ------ | ---- | ----- |
-| 2.3.4-2.11.xx | 2.x   | 0.12+  | 8    | 2.11  |
-| 3.1.2-2.12.xx | 3.x   | 0.12.+ | 8    | 2.12  |
-| 3.2.0-2.12.xx | 3.2.x | 0.12.+ | 8    | 2.12  |
+| Connector | Spark         | Doris       | Java | Scala      |
+| --------- | ------------- |-------------| ---- | ---------- |
+| 1.2.0     | 3.2, 3.1, 2.3 | 1.0 +       | 8    | 2.12, 2.11 |
+| 1.1.0     | 3.2, 3.1, 2.3 | 1.0 +       | 8    | 2.12, 2.11 |
+| 1.0.1     | 3.1, 2.3      | 0.12 - 0.15 | 8    | 2.12, 2.11 |
 
 ## Build and Install
 
 Ready to work
 
-1.Modify the `custom_env.sh.tpl` file and rename it to `custom_env.sh`
-
-2.Specify the thrift installation directory
-
-```bash
-##source file content
-#export THRIFT_BIN=
-#export MVN_BIN=
-#export JAVA_HOME=
-
-##amend as below,MacOS as an example
-export THRIFT_BIN=/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-#export MVN_BIN=
-#export JAVA_HOME=
-
-Install `thrift` 0.13.0 (Note: `Doris` 0.15 and the latest builds are based on `thrift` 0.13.0, previous versions are still built with `thrift` 0.9.3)
-Windows:
-  1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
-  2. Modify thrift-0.13.0.exe to thrift 
- 
-MacOS:
-  1. Download: `brew install thrift@0.13.0`
-  2. default address: /opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-
-Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that the version cannot be found. The solution is as follows, execute it in the terminal:
-    1. `brew tap-new $USER/local-tap`
-    2. `brew extract --version='0.13.0' thrift $USER/local-tap`
-    3. `brew install thrift@0.13.0`
- Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
- 
-Linux:
-    1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
-    2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
-    3.`tar zxvf thrift-0.13.0.tar.gz`
-    4.`cd thrift-0.13.0`
-    5.`./configure --without-tests`
-    6.`make`
-    7.`make install`
-   Check the version after installation is complete:thrift --version
-   Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
-```
+1. Modify the `custom_env.sh.tpl` file and rename it to `custom_env.sh`
 
-Execute following command in source dir
+2. Execute following command in source dir:
+   `sh build.sh`
+   Follow the prompts to enter the Scala and Spark versions you need to start compiling.
 
-```bash
-sh build.sh --spark 2.3.4 --scala 2.11 ## spark 2.3.4, scala 2.11
-sh build.sh --spark 3.1.2 --scala 2.12 ## spark 3.1.2, scala 2.12
-sh build.sh --spark 3.2.0 --scala 2.12 \
---mvn-args "-Dnetty.version=4.1.68.Final -Dfasterxml.jackson.version=2.12.3" ## spark 3.2.0, scala 2.12
-```
-> Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed.
+After the compilation is successful, the target jar package will be generated in the `dist` directory, such as: `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar`.
+Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
 
-After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package ,for example upload `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars.
+For example upload `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars.
 
-1. Upload  doris-spark-connector-3.1.2-2.12-1.0.0.jar  Jar to hdfs.
+1. Upload `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` Jar to hdfs.
 
 ```
 hdfs dfs -mkdir /spark-jars/
-hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/
-
+hdfs dfs -put /your_local_path/spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar /spark-jars/
 ```
 
-2. Add doris-spark-connector-3.1.2-2.12-1.0.0.jar  depence in Cluster.
+2. Add `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` dependency in Cluster.
 
 ```
-spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
+spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar
 ```
 
 
-
 ## Using Maven
 
 ```
 <dependency>
-  <groupId>org.apache.doris</groupId>
-  <artifactId>spark-doris-connector-3.1_2.12</artifactId>
-  <!--artifactId>spark-doris-connector-2.3_2.11</artifactId-->
-  <version>1.0.1</version>
+    <groupId>org.apache.doris</groupId>
+    <artifactId>spark-doris-connector-3.1_2.12</artifactId>
+    <version>1.2.0</version>
 </dependency>
 ```
 
@@ -155,7 +109,7 @@ SELECT * FROM spark_doris;
 ```scala
 val dorisSparkDF = spark.read.format("doris")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   .load()
@@ -187,7 +141,7 @@ dorisSparkDF = spark.read.format("doris")
 .option("user", "$YOUR_DORIS_USERNAME")
 .option("password", "$YOUR_DORIS_PASSWORD")
 .load()
-# show 5 lines data 
+// show 5 lines data 
 dorisSparkDF.show(5)
 ```
 
@@ -222,7 +176,7 @@ mockDataDF.show(5)
 
 mockDataDF.write.format("doris")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   //other options
@@ -242,7 +196,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
   .format("doris")
   .option("checkpointLocation", "$YOUR_CHECKPOINT_LOCATION")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   //other options
diff --git a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
index 6c0b8a5c17..702dc343ab 100644
--- a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
+++ b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
@@ -1,7 +1,7 @@
 ---
 {
-    "title": "Spark Doris Connector",
-    "language": "zh-CN"
+  "title": "Spark Doris Connector",
+  "language": "zh-CN"
 }
 ---
 
@@ -28,7 +28,7 @@ under the License.
 
 Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据,也支持通过Spark写入数据到Doris。
 
-代码库地址:https://github.com/apache/incubator-doris-spark-connector
+代码库地址:https://github.com/apache/doris-spark-connector
 
 - 支持从`Doris`中读取数据
 - 支持`Spark DataFrame`批量/流式 写入`Doris`
@@ -37,94 +37,49 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
 
 ## 版本兼容
 
-| Connector     | Spark | Doris  | Java | Scala |
-|---------------| ----- | ------ | ---- | ----- |
-| 2.3.4-2.11.xx | 2.x   | 0.12+  | 8    | 2.11  |
-| 3.1.2-2.12.xx | 3.x   | 0.12.+ | 8    | 2.12  |
-| 3.2.0-2.12.xx | 3.2.x | 0.12.+ | 8    | 2.12  |
+| Connector | Spark         | Doris       | Java | Scala      |
+| --------- | ------------- |-------------| ---- | ---------- |
+| 1.2.0     | 3.2, 3.1, 2.3 | 1.0 +       | 8    | 2.12, 2.11 |
+| 1.1.0     | 3.2, 3.1, 2.3 | 1.0 +       | 8    | 2.12, 2.11 |
+| 1.0.1     | 3.1, 2.3      | 0.12 - 0.15 | 8    | 2.12, 2.11 |
 
 ## 编译与安装
 
 准备工作
 
-1.修改`custom_env.sh.tpl`文件,重命名为`custom_env.sh`
-
-2.指定thrift安装目录
-
-```bash
-##源文件内容
-#export THRIFT_BIN=
-#export MVN_BIN=
-#export JAVA_HOME=
-
-##修改如下,MacOS为例
-export THRIFT_BIN=/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-#export MVN_BIN=
-#export JAVA_HOME=
-
-安装 `thrift` 0.13.0 版本(注意:`Doris` 0.15 和最新的版本基于 `thrift` 0.13.0 构建, 之前的版本依然使用`thrift` 0.9.3 构建)
- Windows: 
-    1.下载:`http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`(下载目录自己指定)
-    2.修改thrift-0.13.0.exe 为 thrift
- 
- MacOS: 
-    1. 下载:`brew install thrift@0.13.0`
-    2. 默认下载地址:/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-    
- 
- 注:MacOS执行 `brew install thrift@0.13.0` 可能会报找不到版本的错误,解决方法如下,在终端执行:
-    1. `brew tap-new $USER/local-tap`
-    2. `brew extract --version='0.13.0' thrift $USER/local-tap`
-    3. `brew install thrift@0.13.0`
- 参考链接: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
- 
- Linux:
-    1.下载源码包:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
-    2.安装依赖:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
-    3.`tar zxvf thrift-0.13.0.tar.gz`
-    4.`cd thrift-0.13.0`
-    5.`./configure --without-tests`
-    6.`make`
-    7.`make install`
-   安装完成后查看版本:thrift --version  
-   注:如果编译过Doris,则不需要安装thrift,可以直接使用 $DORIS_HOME/thirdparty/installed/bin/thrift
-```
+1. 修改`custom_env.sh.tpl`文件,重命名为`custom_env.sh`
 
-在源码目录下执行:
+2. 在源码目录下执行:
+   `sh build.sh`
+   根据提示输入你需要的 Scala 与 Spark 版本进行编译。
 
-```bash
-sh build.sh --spark 2.3.4 --scala 2.11 ## spark 2.3.4, scala 2.11
-sh build.sh --spark 3.1.2 --scala 2.12 ## spark 3.1.2, scala 2.12
-sh build.sh --spark 3.2.0 --scala 2.12 \
---mvn-args "-Dnetty.version=4.1.68.Final -Dfasterxml.jackson.version=2.12.3" ## spark 3.2.0, scala 2.12
-```
-> 注:如果你是从 tag 检出的源码,则可以直接执行 `sh build.sh --tag`,而无需指定 spark 和 scala 的版本。因为 tag 源码中的版本是固定的。
+编译成功后,会在 `dist` 目录生成目标jar包,如:`spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar`。
+将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。
 
-编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
+例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
 
-例如将 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` 上传到 hdfs并在spark.yarn.jars参数上添加 hdfs上的Jar包路径
+例如将 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
 
-1. 上传doris-spark-connector-3.1.2-2.12-1.0.0.jar 到hdfs。
+1. 上传 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 到hdfs。
 
 ```
 hdfs dfs -mkdir /spark-jars/
-hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/
+hdfs dfs -put /your_local_path/spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar /spark-jars/
 ```
 
-2. 在集群中添加doris-spark-connector-3.1.2-2.12-1.0.0.jar 依赖。
+2. 在集群中添加 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 依赖。
 
 ```
-spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
+spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar
 ```
 
 ## 使用Maven管理
 
 ```
 <dependency>
-  <groupId>org.apache.doris</groupId>
-  <artifactId>spark-doris-connector-3.1_2.12</artifactId>
-  <!--artifactId>spark-doris-connector-2.3_2.11</artifactId-->
-  <version>1.0.1</version>
+    <groupId>org.apache.doris</groupId>
+    <artifactId>spark-doris-connector-3.2_2.12</artifactId>
+    <version>1.2.0</version>
 </dependency>
 ```
 
@@ -155,7 +110,7 @@ SELECT * FROM spark_doris;
 ```scala
 val dorisSparkDF = spark.read.format("doris")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   .load()
@@ -188,12 +143,10 @@ dorisSparkDF = spark.read.format("doris")
 .option("user", "$YOUR_DORIS_USERNAME")
 .option("password", "$YOUR_DORIS_PASSWORD")
 .load()
-# show 5 lines data 
+// show 5 lines data 
 dorisSparkDF.show(5)
 ```
 
-
-
 ### 写入
 
 #### SQL
@@ -226,7 +179,7 @@ mockDataDF.show(5)
 
 mockDataDF.write.format("doris")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   //其它选项
@@ -246,7 +199,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
   .format("doris")
   .option("checkpointLocation", "$YOUR_CHECKPOINT_LOCATION")
   .option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-	.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
   .option("user", "$YOUR_DORIS_USERNAME")
   .option("password", "$YOUR_DORIS_PASSWORD")
   //其它选项
@@ -256,7 +209,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
   .awaitTermination()
 ```
 
-### java示例
+### Java示例
 
 `samples/doris-demo/spark-demo/` 下提供了 Java 版本的示例,可供参考,[这里](https://github.com/apache/incubator-doris/tree/master/samples/doris-demo/spark-demo)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org