You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by qi...@apache.org on 2020/06/19 07:21:10 UTC
[incubator-iotdb] branch master updated: Fix spark tsfile master
(#1389)
This is an automated email from the ASF dual-hosted git repository.
qiaojialin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-iotdb.git
The following commit(s) were added to refs/heads/master by this push:
new e42bbfe Fix spark tsfile master (#1389)
e42bbfe is described below
commit e42bbfe3cf02a6a3b61b22073dfbf2f2d301b047
Author: SilverNarcissus <15...@smail.nju.edu.cn>
AuthorDate: Fri Jun 19 15:21:00 2020 +0800
Fix spark tsfile master (#1389)
* fix spark-tsfile query all measurement bug
---
docs/SystemDesign/Connector/Spark-TsFile.md | 22 +++++++++++++++++++++-
docs/zh/SystemDesign/Connector/Spark-TsFile.md | 16 ++++++++++++++++
.../apache/iotdb/spark/tsfile/DefaultSource.scala | 13 ++++++++++++-
3 files changed, 49 insertions(+), 2 deletions(-)
diff --git a/docs/SystemDesign/Connector/Spark-TsFile.md b/docs/SystemDesign/Connector/Spark-TsFile.md
index 8b1aea3..eea8d15 100644
--- a/docs/SystemDesign/Connector/Spark-TsFile.md
+++ b/docs/SystemDesign/Connector/Spark-TsFile.md
@@ -68,7 +68,27 @@ The main logic is the buildReader function in src / main / scala / org / apache
The main logic of the SQL analysis of the wide table structure is in src / main / scala / org / apache / iotdb / spark / tsfile / WideConverter.scala. This structure is basically the same as the Tsfile native query structure. No special processing is required, and the SQL statement is directly converted into Corresponding query expression
#### 4. Narrow table structure
-The main logic of the SQL analysis of the wide table structure is src / main / scala / org / apache / iotdb / spark / tsfile / NarrowConverter.scala. After the SQL is converted to an expression, the narrow table structure is different from the Tsfile native query structure. The expression is converted into a disjunction expression related to the device before it can be converted into a query of Tsfile. The conversion code is in src / main / java / org / apache / iotdb / spark / tsfile / qp
+The main logic of the SQL analysis of the wide table structure is src / main / scala / org / apache / iotdb / spark / tsfile / NarrowConverter.scala.
+
+Firstly we use required schema to decide which timeseries we should get from time file
+```
+requiredSchema.foreach((field: StructField) => {
+ if (field.name != QueryConstant.RESERVED_TIME
+ && field.name != NarrowConverter.DEVICE_NAME) {
+ measurementNames += field.name
+ }
+})
+```
+
+After the SQL is converted to an expression, the narrow table structure is different from the Tsfile native query structure. The expression is converted into a disjunction expression related to the device before it can be converted into a query of Tsfile. The conversion code is in src / main / java / org / apache / iotdb / spark / tsfile / qp
+
+example:
+```
+select time, device_name, s1 from tsfile_table where time > 1588953600000 and time < 1589040000000 and device_name = 'root.group1.d1'
+```
+Obviously we only need timeseries 'root.group1.d1.s1' and our expression is [time > 1588953600000] and [time < 1589040000000]
+
+
#### 5. Query execution
The actual data query execution is performed by the Tsfile native component, see:
diff --git a/docs/zh/SystemDesign/Connector/Spark-TsFile.md b/docs/zh/SystemDesign/Connector/Spark-TsFile.md
index bb10bc7..ccf947a 100644
--- a/docs/zh/SystemDesign/Connector/Spark-TsFile.md
+++ b/docs/zh/SystemDesign/Connector/Spark-TsFile.md
@@ -78,9 +78,25 @@ SQL解析分宽表结构与窄表结构
宽表结构的SQL解析主要逻辑在 src/main/scala/org/apache/iotdb/spark/tsfile/NarrowConverter.scala中
+首先我们根据查询的schema确定要查询的时间序列,仅在tsfile中查询那些sql中存在的时间序列
+```
+requiredSchema.foreach((field: StructField) => {
+ if (field.name != QueryConstant.RESERVED_TIME
+ && field.name != NarrowConverter.DEVICE_NAME) {
+ measurementNames += field.name
+ }
+})
+```
+
SQL转化为表达式后,由于窄表结构与 TsFile 原生查询结构不同,需要先将表达式转化为与 device 有关的析取表达式
,才可以转化为对 TsFile 的查询,转化代码在src/main/java/org/apache/iotdb/spark/tsfile/qp中
+例子:
+```
+select time, device_name, s1 from tsfile_table where time > 1588953600000 and time < 1589040000000 and device_name = 'root.group1.d1'
+```
+此时仅查询时间序列root.group1.d1.s1,条件表达式为[time > 1588953600000] and [time < 1589040000000]
+
#### 5. 查询实际执行
实际数据查询执行由 TsFile 原生组件完成,参见:
diff --git a/spark-tsfile/src/main/scala/org/apache/iotdb/spark/tsfile/DefaultSource.scala b/spark-tsfile/src/main/scala/org/apache/iotdb/spark/tsfile/DefaultSource.scala
index 257a0f2..ed613fa 100755
--- a/spark-tsfile/src/main/scala/org/apache/iotdb/spark/tsfile/DefaultSource.scala
+++ b/spark-tsfile/src/main/scala/org/apache/iotdb/spark/tsfile/DefaultSource.scala
@@ -41,6 +41,9 @@ import org.apache.spark.sql.execution.datasources.{FileFormat, OutputWriterFacto
import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
import org.apache.spark.sql.types._
import org.slf4j.LoggerFactory
+import scala.collection.JavaConversions._
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
private[tsfile] class DefaultSource extends FileFormat with DataSourceRegister {
@@ -116,7 +119,15 @@ private[tsfile] class DefaultSource extends FileFormat with DataSourceRegister {
if (options.getOrElse(DefaultSource.isNarrowForm, "").equals("narrow_form")) {
val deviceNames = reader.getAllDevices()
- val measurementNames = reader.getAllMeasurements.keySet()
+
+ val measurementNames = new java.util.HashSet[String]()
+
+ requiredSchema.foreach((field: StructField) => {
+ if (field.name != QueryConstant.RESERVED_TIME
+ && field.name != NarrowConverter.DEVICE_NAME) {
+ measurementNames += field.name
+ }
+ })
// construct queryExpression based on queriedSchema and filters
val queryExpressions = NarrowConverter.toQueryExpression(dataSchema, deviceNames,