You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by ha...@apache.org on 2022/01/20 14:23:13 UTC
[iotdb] branch master updated: [IOTDB-2438] update the user guide of Spark Connector (#4931)

This is an automated email from the ASF dual-hosted git repository.

haonan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git


The following commit(s) were added to refs/heads/master by this push:
     new 91dd7f7  [IOTDB-2438] update the user guide of Spark Connector (#4931)
91dd7f7 is described below

commit 91dd7f7b6c73118aca70ab87004f37bd19a119b9
Author: Xuan Ronaldo <xu...@qq.com>
AuthorDate: Thu Jan 20 22:22:38 2022 +0800

    [IOTDB-2438] update the user guide of Spark Connector (#4931)
---
 .../UserGuide/Ecosystem Integration/Spark IoTDB.md | 36 +++++++++++++---------
 .../UserGuide/Ecosystem Integration/Spark IoTDB.md | 27 ++++++++--------
 2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
index 790ce9f..b89b2b3 100644
--- a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md	
+++ b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md	
@@ -19,6 +19,7 @@
 
 -->
 ## Spark-IoTDB
+
 ### version
 
 The versions required for Spark and Java are as follow:
@@ -28,12 +29,13 @@ The versions required for Spark and Java are as follow:
 | `2.4.5`        | `2.12`        | `1.8`        | `0.13.0`|
 
 
-> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it
+> Currently we only support spark version 2.4.5 and there are some known issue on 2.4.7, do no use it
 
 
 ### Install
+```shell
 mvn clean scala:compile compile install
-
+```
 
 #### Maven Dependency
 
@@ -45,11 +47,15 @@ mvn clean scala:compile compile install
     </dependency>
 ```
 
-
 #### spark-shell user guide
 
+Notice: There is a conflict of thrift version between IoTDB and Spark. 
+Therefore, if you want to debug in spark-shell, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it.
+Otherwise, you can only debug the code in IDE. If you want to run your task by `spark-submit`, you must package with dependency.
+
+
 ```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar
 
 import org.apache.iotdb.spark.db._
 
@@ -63,7 +69,7 @@ df.show()
 To partition rdd:
 
 ```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar
 
 import org.apache.iotdb.spark.db._
 
@@ -118,7 +124,7 @@ You can also use narrow table form which as follows: (You can see part 4 about h
 
 * from wide to narrow
 
-```
+```scala
 import org.apache.iotdb.spark.db._
 
 val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -127,7 +133,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)
 
 * from narrow to wide
 
-```
+```scala
 import org.apache.iotdb.spark.db._
 
 val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -135,11 +141,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)
 
 #### Java user guide
 
-```
+```java
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;
 
 public class Example {
 
@@ -158,15 +164,15 @@ public class Example {
 
     df.show();
     
-    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
-    narrowTable.show()
+    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+    narrowTable.show();
   }
 }
 ```
 
-## Write Data to IoTDB
+### Write Data to IoTDB
 
-### User Guide
+#### User Guide
 ``` scala
 // import narrow table
 val df = spark.createDataFrame(List(
@@ -205,6 +211,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
     .save
 ```
 
-### Notes
+#### Notes
 1. You can directly write data to IoTDB whatever the dataframe contains a wide table or a narrow table.
-2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before  writing data. Each partition will open a session to write data to increase the number of concurrent requests.
\ No newline at end of file
+2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before  writing data. Each partition will open a session to write data to increase the number of concurrent requests.
diff --git a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
index 3b6320e..04f5bf0 100644
--- a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md	
+++ b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md	
@@ -45,7 +45,10 @@ mvn clean scala:compile compile install
 
 #### Spark-shell用户指南
 
-```
+注意：因为IoTDB与Spark的thrift版本有冲突，所以需要通过执行`rm -f $SPARK_HOME/jars/libthrift*`和`cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/`这两个命令来解决。
+否则的话，就只能在IDE里面进行代码调试。而且如果你需要通过`spark-submit`命令提交任务的话，你打包时必须要带上依赖。
+
+```shell
 spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
 
 import org.apache.iotdb.spark.db._
@@ -59,7 +62,7 @@ df.show()
 
 如果要对rdd进行分区，可以执行以下操作
 
-```
+```shell
 spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
 
 import org.apache.iotdb.spark.db._
@@ -123,7 +126,7 @@ TsFile中的现有数据如下：
 
 * 从宽到窄
 
-```
+```scala
 import org.apache.iotdb.spark.db._
 
 val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -132,7 +135,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)
 
 * 从窄到宽
 
-```
+```scala
 import org.apache.iotdb.spark.db._
 
 val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -140,11 +143,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)
 
 #### Java用户指南
 
-```
+```java
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;
 
 public class Example {
 
@@ -163,14 +166,14 @@ public class Example {
 
     df.show();
     
-    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
-    narrowTable.show()
+    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+    narrowTable.show();
   }
 }
 ```
 
-## 写数据到IoTDB
-### 用户指南
+### 写数据到IoTDB
+#### 用户指南
 ``` scala
 // import narrow table
 val df = spark.createDataFrame(List(
@@ -209,6 +212,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
     .save
 ```
 
-### 注意
+#### 注意
 1. 无论dataframe中存放的是窄表还是宽表，都可以直接将数据写到IoTDB中。
-2. numPartition参数是用来设置分区数，会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入，来提高并发数。
\ No newline at end of file
+2. numPartition参数是用来设置分区数，会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入，来提高并发数。