You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by xi...@apache.org on 2021/12/19 14:18:47 UTC
[iotdb] branch master updated: [IOTDB-2173] Update document about group by level clause (#4599)

This is an automated email from the ASF dual-hosted git repository.

xiangweiwei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git


The following commit(s) were added to refs/heads/master by this push:
     new 583c186  [IOTDB-2173] Update document about group by level clause (#4599)
583c186 is described below

commit 583c18675e8df5a85c3dccb41ce1a581b2acb49e
Author: Xiangwei Wei <34...@users.noreply.github.com>
AuthorDate: Sun Dec 19 22:18:13 2021 +0800

    [IOTDB-2173] Update document about group by level clause (#4599)
---
 .../Advanced-Features/Continuous-Query.md          |   2 +-
 .../DML-Data-Manipulation-Language.md              | 105 +++++++++++++++------
 .../Advanced-Features/Continuous-Query.md          |   2 +-
 .../DML-Data-Manipulation-Language.md              |  99 ++++++++++++++-----
 4 files changed, 149 insertions(+), 59 deletions(-)

diff --git a/docs/UserGuide/Advanced-Features/Continuous-Query.md b/docs/UserGuide/Advanced-Features/Continuous-Query.md
index 3cea3ad..505ee65 100644
--- a/docs/UserGuide/Advanced-Features/Continuous-Query.md
+++ b/docs/UserGuide/Advanced-Features/Continuous-Query.md
@@ -47,7 +47,7 @@ END
 * `<path_prefix>` and `<path_suffix>` are spliced into the queried time series path.
 * `<full_path>` or `<node_name>` specifies the result time series path.
 * `<group_by_interval>` specifies the time grouping length. We currently support the units of ns, us, ms, s, m, h, d, w, mo, y.
-* `<level>` refers to grouping according to the `<level>` level of the time series, and aggregates all time series below the `<level>` level. For the specific semantics of the Group By Level statement and the definition of `<level>`, see [aggregation-by-level](../IoTDB-SQL-Language/DML-Data-Manipulation-Language.md)
+* `<level>` refers to grouping according to the `<level>` level of the time series, and grouping the aggregation result of  time series with the same name below the `<level>` level. For the specific semantics of the Group By Level statement and the definition of `<level>`, see [aggregation-by-level](../IoTDB-SQL-Language/DML-Data-Manipulation-Language.md)
 
 
 Note:
diff --git a/docs/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md b/docs/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
index 8597e7e..1515880 100644
--- a/docs/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
+++ b/docs/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
@@ -589,69 +589,112 @@ It costs 0.016s
 
 #### Aggregation By Level
 
-**Aggregation by level statement** is used for aggregating upon specific hierarchical level of timeseries path.
-For all timeseries paths, by convention, "level=0" represents *root* level. 
-That is, to tally the points of any measurements under "root.ln", the level should be set to 1.
+Aggregation by level statement is used to group the query result whose name is the same at the given level. Keyword `LEVEL` is used to specify the level that need to be grouped.  By convention, `level=0` represents *root* level. 
 
-For example, there are multiple series under "root.ln.wf01", such as "root.ln.wf01.wt01.status","root.ln.wf01.wt02.status","root.ln.wf01.wt03.status".
-To count the number of "status" points of all these series, use query:
+For example：there are multiple series named `status` under different storage groups， like "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", and "root.sgcc.wf03.wt01.status". If you need to count the number of data points of the `status` sequence under different storage groups, use the following query:
 
 ```sql
-select count(status) from root.ln.wf01.* group by level=2
+select count(status)
+from root.**
+group by level = 1
 ```
-Result:
+
+Result：
+
+```
++-------------------------+---------------------------+
+|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)|
++-------------------------+---------------------------+
+|                    20160|                      10080|
++-------------------------+---------------------------+
+Total line number = 1
+It costs 0.003s
+```
+
+Similarly，if you need to count the number of data points under different devices, you can specify level = 3,
+
+```sql
+select count(status)
+from root.**
+group by level = 3
+```
+
+Result：
 
 ```
-+----------------------------+
-|COUNT(root.ln.wf01.*.status)|
-+----------------------------+
-|                       10080|
-+----------------------------+
++---------------------------+---------------------------+
+|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)|
++---------------------------+---------------------------+
+|                      20160|                      10080|
++---------------------------+---------------------------+
+Total line number = 1
+It costs 0.003s
+```
+
+Attention，the devices named `wt01` under storage groups `ln` and `sgcc` are grouped together, since they are regarded as devices with the same name. If you need to further count the number of data points in different devices under different storage groups, you can use the following query:
+
+```sql
+select count(status)
+from root.**
+group by level = 1, 3
+```
+
+Result：
+
+```
++----------------------------+----------------------------+------------------------------+
+|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)|
++----------------------------+----------------------------+------------------------------+
+|                       10080|                       10080|                         10080|
++----------------------------+----------------------------+------------------------------+
 Total line number = 1
 It costs 0.003s
 ```
 
 
-Suppose we add another two timeseries, "root.ln.wf01.wt01.temperature" and "root.ln.wf02.wt01.temperature".
-To query the count and the sum of "temperature" under path "root.ln.*.*", 
-aggregating on level=2, use following statement:
+
+Assuming that you want to query the maximum value of temperature sensor under all time series, you can use the following query statement:
 
 ```sql
-select count(temperature), sum(temperature) from root.ln.*.* group by level=2
+select max_value(temperature)
+from root.**
+group by level = 0
 ```
+
 Result：
 
 ```
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
-|count(root.ln.wf02.*.temperature)|count(root.ln.wf01.*.temperature)|sum(root.ln.wf02.*.temperature)|sum(root.ln.wf01.*.temperature)|
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
-|                                8|                                4|                          228.0|              91.83000183105469|
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
++---------------------------------+
+|max_value(root.*.*.*.temperature)|
++---------------------------------+
+|                             26.0|
++---------------------------------+
 Total line number = 1
 It costs 0.013s
 ```
 
-To query the count and the sum of path "root.ln.\*.\*.temperature" aggregating on "root.ln" level,
-simply set level=1
+The above queries are for a certain sensor. In particular, **if you want to query the total data points owned by all sensors at a certain level**, you need to explicitly specify `*` is selected.
 
 ```sql
-select count(temperature), sum(temperature) from root.ln.*.* group by level=1
+select count(*)
+from root.ln.**
+group by level = 2
 ```
+
 Result：
 
 ```
-+------------------------------+----------------------------+
-|count(root.ln.*.*.temperature)|sum(root.ln.*.*.temperature)|
-+------------------------------+----------------------------+
-|                            12|           319.8300018310547|
-+------------------------------+----------------------------+
++----------------------+----------------------+
+|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)|
++----------------------+----------------------+
+|                 20160|                 20160|
++----------------------+----------------------+
 Total line number = 1
 It costs 0.013s
 ```
 
 All supported aggregation functions are: count, sum, avg, last_value, first_value, min_time, max_time, min_value, max_value, extreme.
-When using four aggregations: sum, avg, min_value, max_value and extreme please make sure all the aggregated series have exactly the same data type.
-Otherwise, it will generate a syntax error.
+When using four aggregations: sum, avg, min_value, max_value and extreme please make sure all the aggregated series have exactly the same data type. Otherwise, it will generate a syntax error.
 
 #### Down-Frequency Aggregate Query
 
diff --git a/docs/zh/UserGuide/Advanced-Features/Continuous-Query.md b/docs/zh/UserGuide/Advanced-Features/Continuous-Query.md
index 68a918f..73fa701 100644
--- a/docs/zh/UserGuide/Advanced-Features/Continuous-Query.md
+++ b/docs/zh/UserGuide/Advanced-Features/Continuous-Query.md
@@ -50,7 +50,7 @@ END
 * `<full_path>` 或 `<node_name>` 指定将查询出的数据写入的结果序列路径。
 * `<group_by_interval>` 指定时间分组长度，支持 ns、us、ms、s、m、h
   、d、w、mo、y 等单位。
-* `<level>`指按照序列第 `<level>` 层分组，将第 `<level>` 层以下的所有序列聚合。Group By Level 语句的具体语义及 `<level>` 的定义见 [路径层级分组聚合](../IoTDB-SQL-Language/DML-Data-Manipulation-Language.md)。
+* `<level>`指按照序列第 `<level>` 层分组，将第 `<level>` 层同名的所有序列聚合。Group By Level 语句的具体语义及 `<level>` 的定义见 [路径层级分组聚合](../IoTDB-SQL-Language/DML-Data-Manipulation-Language.md)。
 
 注：
 
diff --git a/docs/zh/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md b/docs/zh/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
index db90614..8cefc42 100644
--- a/docs/zh/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
+++ b/docs/zh/UserGuide/IoTDB-SQL-Language/DML-Data-Manipulation-Language.md
@@ -602,66 +602,113 @@ It costs 0.016s
 
 #### 路径层级分组聚合
 
-在时间序列层级结构中，分层聚合查询用于对某一层级进行聚合查询。
-这里使用 LEVEL 来统计指定层级下的聚合范围，该语句约定 root 为第 0 层序列，若统计"root.ln"下所有序列则需指定 level 为 1。
+在时间序列层级结构中，分层聚合查询用于**对某一层级下同名的序列进行聚合查询**。 这里使用 LEVEL 来统计指定需要聚合的层级，该语句约定 root 为第 0 层序列，若统计 "root.ln" 下所有序列则需指定 level 为 1。
 
-例如：在"root.ln.wf01"下存在多个子序列：wt01，wt02，wt03 等均有名为 status 的序列，
-如果需要统计这些子序列的 status 包含的点个数，使用以下查询：
+例如：不同存储组下均存在名为 status 的序列， 如 "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", 以及 "
+root.sgcc.wf03.wt01.status", 如果需要统计不同存储组下 status 序列的数据点个数，使用以下查询：
 
 ```sql
-select count(status) from root.ln.wf01.* group by level=2
+select count(status)
+from root.**
+group by level = 1
 ```
 
 运行结果为：
 
 ```
-+----------------------------+
-|count(root.ln.wf01.*.status)|
-+----------------------------+
-|                       10080|
-+----------------------------+
++-------------------------+---------------------------+
+|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)|
++-------------------------+---------------------------+
+|                    20160|                      10080|
++-------------------------+---------------------------+
 Total line number = 1
 It costs 0.003s
 ```
 
-假设此时添加两条序列，"root.ln.wf01.wt01.temperature" and "root.ln.wf02.wt01.temperature"。
-需要同时查询"root.ln.\*.\*.temperature"在第二层级的 count 聚合结果和 sum 聚合结果，可以使用下列查询语句：
+同理，如果需要统计不同设备下 status 序列的数据点个数，可以规定 level = 3，
 
 ```sql
-select count(temperature), sum(temperature) from root.ln.*.* group by level=2
+select count(status)
+from root.**
+group by level = 3
+```
+
+运行结果为：
+
+```
++---------------------------+---------------------------+
+|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)|
++---------------------------+---------------------------+
+|                      20160|                      10080|
++---------------------------+---------------------------+
+Total line number = 1
+It costs 0.003s
+```
+
+注意，这时会将存储组 `ln` 和 `sgcc` 下名为 `wt01` 的设备视为同名设备聚合在一起。而如果需要进一步统计不同存储组下的不同设备中 status 序列的数据点个数，可以使用以下查询：
+
+```sql
+select count(status)
+from root.**
+group by level = 1, 3
+```
+
+运行结果为：
+
+```
++----------------------------+----------------------------+------------------------------+
+|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)|
++----------------------------+----------------------------+------------------------------+
+|                       10080|                       10080|                         10080|
++----------------------------+----------------------------+------------------------------+
+Total line number = 1
+It costs 0.003s
+```
+
+
+
+假设需要查询所有序列下温度传感器 temperature 的最大值，可以使用下列查询语句：
+
+```sql
+select max_value(temperature)
+from root.**
+group by level = 0
 ```
 
 运行结果：
 
 ```
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
-|count(root.ln.wf02.*.temperature)|count(root.ln.wf01.*.temperature)|sum(root.ln.wf02.*.temperature)|sum(root.ln.wf01.*.temperature)|
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
-|                                8|                                4|                          228.0|              91.83000183105469|
-+---------------------------------+---------------------------------+-------------------------------+-------------------------------+
++---------------------------------+
+|max_value(root.*.*.*.temperature)|
++---------------------------------+
+|                             26.0|
++---------------------------------+
 Total line number = 1
 It costs 0.013s
 ```
 
-若统计"root.ln.\*.\*"下第一层级的 count 聚合结果和 sum 聚合结果，则设置 level=1 即可：
+上面的查询都是针对某一个传感器，特别地，**如果想要查询某一层级下所有传感器拥有的总数据点数，则需要显式规定测点为 `*`**
 
 ```sql
-select count(temperature), sum(temperature) from root.ln.*.* group by level=1
+select count(*)
+from root.ln.**
+group by level = 2
 ```
 
 运行结果：
 
 ```
-+------------------------------+----------------------------+
-|count(root.ln.*.*.temperature)|sum(root.ln.*.*.temperature)|
-+------------------------------+----------------------------+
-|                            12|           319.8300018310547|
-+------------------------------+----------------------------+
++----------------------+----------------------+
+|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)|
++----------------------+----------------------+
+|                 20160|                 20160|
++----------------------+----------------------+
 Total line number = 1
 It costs 0.013s
 ```
 
-分层聚合查询也可被用于其他聚合函数，当前所支持的聚合函数为：count, sum, avg, last_value, first_value, min_time, max_time, min_value, max_value, extreme
+分层聚合查询也可被用于其他聚合函数，当前所支持的聚合函数为：count, sum, avg, last_value, first_value, min_time, max_time,
+min_value, max_value, extreme
 
 对于 sum, avg, min_value, max_value, extreme 五种聚合函数，需保证所有聚合的时间序列数据类型相同。其他聚合函数没有此限制。