You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by zy...@apache.org on 2023/06/09 08:32:24 UTC
[doris] branch branch-1.2-lts updated: [typo](doc) update spark connector doc for branch 1.2 (#20640)
This is an automated email from the ASF dual-hosted git repository.
zykkk pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/branch-1.2-lts by this push:
new 485554f5f7 [typo](doc) update spark connector doc for branch 1.2 (#20640)
485554f5f7 is described below
commit 485554f5f7f7476bea6aca549b5baa6aca81e50d
Author: gnehil <ad...@gmail.com>
AuthorDate: Fri Jun 9 16:32:15 2023 +0800
[typo](doc) update spark connector doc for branch 1.2 (#20640)
---
docs/en/docs/ecosystem/spark-doris-connector.md | 98 ++++++--------------
docs/zh-CN/docs/ecosystem/spark-doris-connector.md | 103 ++++++---------------
2 files changed, 54 insertions(+), 147 deletions(-)
diff --git a/docs/en/docs/ecosystem/spark-doris-connector.md b/docs/en/docs/ecosystem/spark-doris-connector.md
index 56d403f2e7..b39d8ff678 100644
--- a/docs/en/docs/ecosystem/spark-doris-connector.md
+++ b/docs/en/docs/ecosystem/spark-doris-connector.md
@@ -1,7 +1,7 @@
---
{
- "title": "Spark Doris Connector",
- "language": "en"
+ "title": "Spark Doris Connector",
+ "language": "en"
}
---
@@ -28,7 +28,7 @@ under the License.
Spark Doris Connector can support reading data stored in Doris and writing data to Doris through Spark.
-Github: https://github.com/apache/incubator-doris-spark-connector
+Github: https://github.com/apache/doris-spark-connector
- Support reading data from `Doris`.
- Support `Spark DataFrame` batch/stream writing data to `Doris`
@@ -37,94 +37,48 @@ Github: https://github.com/apache/incubator-doris-spark-connector
## Version Compatibility
-| Connector | Spark | Doris | Java | Scala |
-|---------------| ----- | ------ | ---- | ----- |
-| 2.3.4-2.11.xx | 2.x | 0.12+ | 8 | 2.11 |
-| 3.1.2-2.12.xx | 3.x | 0.12.+ | 8 | 2.12 |
-| 3.2.0-2.12.xx | 3.2.x | 0.12.+ | 8 | 2.12 |
+| Connector | Spark | Doris | Java | Scala |
+| --------- | ------------- |-------------| ---- | ---------- |
+| 1.2.0 | 3.2, 3.1, 2.3 | 1.0 + | 8 | 2.12, 2.11 |
+| 1.1.0 | 3.2, 3.1, 2.3 | 1.0 + | 8 | 2.12, 2.11 |
+| 1.0.1 | 3.1, 2.3 | 0.12 - 0.15 | 8 | 2.12, 2.11 |
## Build and Install
Ready to work
-1.Modify the `custom_env.sh.tpl` file and rename it to `custom_env.sh`
-
-2.Specify the thrift installation directory
-
-```bash
-##source file content
-#export THRIFT_BIN=
-#export MVN_BIN=
-#export JAVA_HOME=
-
-##amend as below,MacOS as an example
-export THRIFT_BIN=/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-#export MVN_BIN=
-#export JAVA_HOME=
-
-Install `thrift` 0.13.0 (Note: `Doris` 0.15 and the latest builds are based on `thrift` 0.13.0, previous versions are still built with `thrift` 0.9.3)
-Windows:
- 1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
- 2. Modify thrift-0.13.0.exe to thrift
-
-MacOS:
- 1. Download: `brew install thrift@0.13.0`
- 2. default address: /opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-
-Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that the version cannot be found. The solution is as follows, execute it in the terminal:
- 1. `brew tap-new $USER/local-tap`
- 2. `brew extract --version='0.13.0' thrift $USER/local-tap`
- 3. `brew install thrift@0.13.0`
- Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
-
-Linux:
- 1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
- 2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
- 3.`tar zxvf thrift-0.13.0.tar.gz`
- 4.`cd thrift-0.13.0`
- 5.`./configure --without-tests`
- 6.`make`
- 7.`make install`
- Check the version after installation is complete:thrift --version
- Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
-```
+1. Modify the `custom_env.sh.tpl` file and rename it to `custom_env.sh`
-Execute following command in source dir
+2. Execute following command in source dir:
+ `sh build.sh`
+ Follow the prompts to enter the Scala and Spark versions you need to start compiling.
-```bash
-sh build.sh --spark 2.3.4 --scala 2.11 ## spark 2.3.4, scala 2.11
-sh build.sh --spark 3.1.2 --scala 2.12 ## spark 3.1.2, scala 2.12
-sh build.sh --spark 3.2.0 --scala 2.12 \
---mvn-args "-Dnetty.version=4.1.68.Final -Dfasterxml.jackson.version=2.12.3" ## spark 3.2.0, scala 2.12
-```
-> Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed.
+After the compilation is successful, the target jar package will be generated in the `dist` directory, such as: `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar`.
+Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
-After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package ,for example upload `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars.
+For example upload `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars.
-1. Upload doris-spark-connector-3.1.2-2.12-1.0.0.jar Jar to hdfs.
+1. Upload `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` Jar to hdfs.
```
hdfs dfs -mkdir /spark-jars/
-hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/
-
+hdfs dfs -put /your_local_path/spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar /spark-jars/
```
-2. Add doris-spark-connector-3.1.2-2.12-1.0.0.jar depence in Cluster.
+2. Add `spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar` dependency in Cluster.
```
-spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
+spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-3.1_2.12-1.2.0-SNAPSHOT.jar
```
-
## Using Maven
```
<dependency>
- <groupId>org.apache.doris</groupId>
- <artifactId>spark-doris-connector-3.1_2.12</artifactId>
- <!--artifactId>spark-doris-connector-2.3_2.11</artifactId-->
- <version>1.0.1</version>
+ <groupId>org.apache.doris</groupId>
+ <artifactId>spark-doris-connector-3.1_2.12</artifactId>
+ <version>1.2.0</version>
</dependency>
```
@@ -155,7 +109,7 @@ SELECT * FROM spark_doris;
```scala
val dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
@@ -187,7 +141,7 @@ dorisSparkDF = spark.read.format("doris")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
-# show 5 lines data
+// show 5 lines data
dorisSparkDF.show(5)
```
@@ -222,7 +176,7 @@ mockDataDF.show(5)
mockDataDF.write.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//other options
@@ -242,7 +196,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
.format("doris")
.option("checkpointLocation", "$YOUR_CHECKPOINT_LOCATION")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//other options
diff --git a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
index 6c0b8a5c17..702dc343ab 100644
--- a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
+++ b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
@@ -1,7 +1,7 @@
---
{
- "title": "Spark Doris Connector",
- "language": "zh-CN"
+ "title": "Spark Doris Connector",
+ "language": "zh-CN"
}
---
@@ -28,7 +28,7 @@ under the License.
Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据,也支持通过Spark写入数据到Doris。
-代码库地址:https://github.com/apache/incubator-doris-spark-connector
+代码库地址:https://github.com/apache/doris-spark-connector
- 支持从`Doris`中读取数据
- 支持`Spark DataFrame`批量/流式 写入`Doris`
@@ -37,94 +37,49 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
## 版本兼容
-| Connector | Spark | Doris | Java | Scala |
-|---------------| ----- | ------ | ---- | ----- |
-| 2.3.4-2.11.xx | 2.x | 0.12+ | 8 | 2.11 |
-| 3.1.2-2.12.xx | 3.x | 0.12.+ | 8 | 2.12 |
-| 3.2.0-2.12.xx | 3.2.x | 0.12.+ | 8 | 2.12 |
+| Connector | Spark | Doris | Java | Scala |
+| --------- | ------------- |-------------| ---- | ---------- |
+| 1.2.0 | 3.2, 3.1, 2.3 | 1.0 + | 8 | 2.12, 2.11 |
+| 1.1.0 | 3.2, 3.1, 2.3 | 1.0 + | 8 | 2.12, 2.11 |
+| 1.0.1 | 3.1, 2.3 | 0.12 - 0.15 | 8 | 2.12, 2.11 |
## 编译与安装
准备工作
-1.修改`custom_env.sh.tpl`文件,重命名为`custom_env.sh`
-
-2.指定thrift安装目录
-
-```bash
-##源文件内容
-#export THRIFT_BIN=
-#export MVN_BIN=
-#export JAVA_HOME=
-
-##修改如下,MacOS为例
-export THRIFT_BIN=/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-#export MVN_BIN=
-#export JAVA_HOME=
-
-安装 `thrift` 0.13.0 版本(注意:`Doris` 0.15 和最新的版本基于 `thrift` 0.13.0 构建, 之前的版本依然使用`thrift` 0.9.3 构建)
- Windows:
- 1.下载:`http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`(下载目录自己指定)
- 2.修改thrift-0.13.0.exe 为 thrift
-
- MacOS:
- 1. 下载:`brew install thrift@0.13.0`
- 2. 默认下载地址:/opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift
-
-
- 注:MacOS执行 `brew install thrift@0.13.0` 可能会报找不到版本的错误,解决方法如下,在终端执行:
- 1. `brew tap-new $USER/local-tap`
- 2. `brew extract --version='0.13.0' thrift $USER/local-tap`
- 3. `brew install thrift@0.13.0`
- 参考链接: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
-
- Linux:
- 1.下载源码包:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
- 2.安装依赖:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
- 3.`tar zxvf thrift-0.13.0.tar.gz`
- 4.`cd thrift-0.13.0`
- 5.`./configure --without-tests`
- 6.`make`
- 7.`make install`
- 安装完成后查看版本:thrift --version
- 注:如果编译过Doris,则不需要安装thrift,可以直接使用 $DORIS_HOME/thirdparty/installed/bin/thrift
-```
+1. 修改`custom_env.sh.tpl`文件,重命名为`custom_env.sh`
-在源码目录下执行:
+2. 在源码目录下执行:
+ `sh build.sh`
+ 根据提示输入你需要的 Scala 与 Spark 版本进行编译。
-```bash
-sh build.sh --spark 2.3.4 --scala 2.11 ## spark 2.3.4, scala 2.11
-sh build.sh --spark 3.1.2 --scala 2.12 ## spark 3.1.2, scala 2.12
-sh build.sh --spark 3.2.0 --scala 2.12 \
---mvn-args "-Dnetty.version=4.1.68.Final -Dfasterxml.jackson.version=2.12.3" ## spark 3.2.0, scala 2.12
-```
-> 注:如果你是从 tag 检出的源码,则可以直接执行 `sh build.sh --tag`,而无需指定 spark 和 scala 的版本。因为 tag 源码中的版本是固定的。
+编译成功后,会在 `dist` 目录生成目标jar包,如:`spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar`。
+将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。
-编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
+例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
-例如将 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` 上传到 hdfs并在spark.yarn.jars参数上添加 hdfs上的Jar包路径
+例如将 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
-1. 上传doris-spark-connector-3.1.2-2.12-1.0.0.jar 到hdfs。
+1. 上传 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 到hdfs。
```
hdfs dfs -mkdir /spark-jars/
-hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/
+hdfs dfs -put /your_local_path/spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar /spark-jars/
```
-2. 在集群中添加doris-spark-connector-3.1.2-2.12-1.0.0.jar 依赖。
+2. 在集群中添加 `spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar` 依赖。
```
-spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
+spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-3.2_2.12-1.2.0-SNAPSHOT.jar
```
## 使用Maven管理
```
<dependency>
- <groupId>org.apache.doris</groupId>
- <artifactId>spark-doris-connector-3.1_2.12</artifactId>
- <!--artifactId>spark-doris-connector-2.3_2.11</artifactId-->
- <version>1.0.1</version>
+ <groupId>org.apache.doris</groupId>
+ <artifactId>spark-doris-connector-3.2_2.12</artifactId>
+ <version>1.2.0</version>
</dependency>
```
@@ -155,7 +110,7 @@ SELECT * FROM spark_doris;
```scala
val dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
@@ -188,12 +143,10 @@ dorisSparkDF = spark.read.format("doris")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
-# show 5 lines data
+// show 5 lines data
dorisSparkDF.show(5)
```
-
-
### 写入
#### SQL
@@ -226,7 +179,7 @@ mockDataDF.show(5)
mockDataDF.write.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//其它选项
@@ -246,7 +199,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
.format("doris")
.option("checkpointLocation", "$YOUR_CHECKPOINT_LOCATION")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
- .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+ .option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//其它选项
@@ -256,7 +209,7 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value as STRING)")
.awaitTermination()
```
-### java示例
+### Java示例
`samples/doris-demo/spark-demo/` 下提供了 Java 版本的示例,可供参考,[这里](https://github.com/apache/incubator-doris/tree/master/samples/doris-demo/spark-demo)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org