You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by ha...@apache.org on 2022/01/20 14:23:13 UTC
[iotdb] branch master updated: [IOTDB-2438] update the user guide of Spark Connector (#4931)
This is an automated email from the ASF dual-hosted git repository.
haonan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git
The following commit(s) were added to refs/heads/master by this push:
new 91dd7f7 [IOTDB-2438] update the user guide of Spark Connector (#4931)
91dd7f7 is described below
commit 91dd7f7b6c73118aca70ab87004f37bd19a119b9
Author: Xuan Ronaldo <xu...@qq.com>
AuthorDate: Thu Jan 20 22:22:38 2022 +0800
[IOTDB-2438] update the user guide of Spark Connector (#4931)
---
.../UserGuide/Ecosystem Integration/Spark IoTDB.md | 36 +++++++++++++---------
.../UserGuide/Ecosystem Integration/Spark IoTDB.md | 27 ++++++++--------
2 files changed, 36 insertions(+), 27 deletions(-)
diff --git a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
index 790ce9f..b89b2b3 100644
--- a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
+++ b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
@@ -19,6 +19,7 @@
-->
## Spark-IoTDB
+
### version
The versions required for Spark and Java are as follow:
@@ -28,12 +29,13 @@ The versions required for Spark and Java are as follow:
| `2.4.5` | `2.12` | `1.8` | `0.13.0`|
-> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it
+> Currently we only support spark version 2.4.5 and there are some known issue on 2.4.7, do no use it
### Install
+```shell
mvn clean scala:compile compile install
-
+```
#### Maven Dependency
@@ -45,11 +47,15 @@ mvn clean scala:compile compile install
</dependency>
```
-
#### spark-shell user guide
+Notice: There is a conflict of thrift version between IoTDB and Spark.
+Therefore, if you want to debug in spark-shell, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it.
+Otherwise, you can only debug the code in IDE. If you want to run your task by `spark-submit`, you must package with dependency.
+
+
```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar
import org.apache.iotdb.spark.db._
@@ -63,7 +69,7 @@ df.show()
To partition rdd:
```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar
import org.apache.iotdb.spark.db._
@@ -118,7 +124,7 @@ You can also use narrow table form which as follows: (You can see part 4 about h
* from wide to narrow
-```
+```scala
import org.apache.iotdb.spark.db._
val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -127,7 +133,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)
* from narrow to wide
-```
+```scala
import org.apache.iotdb.spark.db._
val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -135,11 +141,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)
#### Java user guide
-```
+```java
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;
public class Example {
@@ -158,15 +164,15 @@ public class Example {
df.show();
- Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
- narrowTable.show()
+ Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+ narrowTable.show();
}
}
```
-## Write Data to IoTDB
+### Write Data to IoTDB
-### User Guide
+#### User Guide
``` scala
// import narrow table
val df = spark.createDataFrame(List(
@@ -205,6 +211,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
.save
```
-### Notes
+#### Notes
1. You can directly write data to IoTDB whatever the dataframe contains a wide table or a narrow table.
-2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before writing data. Each partition will open a session to write data to increase the number of concurrent requests.
\ No newline at end of file
+2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before writing data. Each partition will open a session to write data to increase the number of concurrent requests.
diff --git a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
index 3b6320e..04f5bf0 100644
--- a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
+++ b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
@@ -45,7 +45,10 @@ mvn clean scala:compile compile install
#### Spark-shell用户指南
-```
+注意:因为IoTDB与Spark的thrift版本有冲突,所以需要通过执行`rm -f $SPARK_HOME/jars/libthrift*`和`cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/`这两个命令来解决。
+否则的话,就只能在IDE里面进行代码调试。而且如果你需要通过`spark-submit`命令提交任务的话,你打包时必须要带上依赖。
+
+```shell
spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
import org.apache.iotdb.spark.db._
@@ -59,7 +62,7 @@ df.show()
如果要对rdd进行分区,可以执行以下操作
-```
+```shell
spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
import org.apache.iotdb.spark.db._
@@ -123,7 +126,7 @@ TsFile中的现有数据如下:
* 从宽到窄
-```
+```scala
import org.apache.iotdb.spark.db._
val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -132,7 +135,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)
* 从窄到宽
-```
+```scala
import org.apache.iotdb.spark.db._
val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -140,11 +143,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)
#### Java用户指南
-```
+```java
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;
public class Example {
@@ -163,14 +166,14 @@ public class Example {
df.show();
- Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
- narrowTable.show()
+ Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+ narrowTable.show();
}
}
```
-## 写数据到IoTDB
-### 用户指南
+### 写数据到IoTDB
+#### 用户指南
``` scala
// import narrow table
val df = spark.createDataFrame(List(
@@ -209,6 +212,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
.save
```
-### 注意
+#### 注意
1. 无论dataframe中存放的是窄表还是宽表,都可以直接将数据写到IoTDB中。
-2. numPartition参数是用来设置分区数,会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入,来提高并发数。
\ No newline at end of file
+2. numPartition参数是用来设置分区数,会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入,来提高并发数。