You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by hu...@apache.org on 2022/08/25 06:27:20 UTC

[iotdb] branch master updated: [IOTDB-4131] Add doc for SessionWindowStrategy and StateWindowStrategy (#7086)

This is an automated email from the ASF dual-hosted git repository.

hui pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git


The following commit(s) were added to refs/heads/master by this push:
     new 5c62ac98c1 [IOTDB-4131] Add doc for SessionWindowStrategy and StateWindowStrategy  (#7086)
5c62ac98c1 is described below

commit 5c62ac98c130c43e4dd5e50a0aa4da563c0c9ed3
Author: AACEPT <34...@users.noreply.github.com>
AuthorDate: Thu Aug 25 14:27:13 2022 +0800

    [IOTDB-4131] Add doc for SessionWindowStrategy and StateWindowStrategy  (#7086)
---
 .../Process-Data/UDF-User-Defined-Function.md      | 36 +++++++++++++++++-----
 .../Process-Data/UDF-User-Defined-Function.md      | 32 ++++++++++++++++---
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/docs/UserGuide/Process-Data/UDF-User-Defined-Function.md b/docs/UserGuide/Process-Data/UDF-User-Defined-Function.md
index 9c1dd6f58e..bf0fbb5f6f 100644
--- a/docs/UserGuide/Process-Data/UDF-User-Defined-Function.md
+++ b/docs/UserGuide/Process-Data/UDF-User-Defined-Function.md
@@ -161,17 +161,19 @@ Note that the raw data access strategy you set here determines which `transform`
 
 The following are the strategies you can set:
 
-| Interface definition              | Description                                                  | The `transform` Method to Call                               |
-| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
-| `RowByRowAccessStrategy`          | Process raw data row by row. The framework calls the `transform` method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(Row row, [...]
-| `SlidingTimeWindowAccessStrategy` | Process a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindo [...]
-| `SlidingSizeWindowAccessStrategy`    | The raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, bu [...]
-
+| Interface definition              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                            [...]
+| :-------------------------------- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...]
+| `RowByRowAccessStrategy`          | Process raw data row by row. The framework calls the `transform` method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`)                            [...]
+| `SlidingTimeWindowAccessStrategy` | Process a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`)                                      [...]
+| `SlidingSizeWindowAccessStrategy`    | The raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, bu [...]
+| `SessionTimeWindowAccessStrategy` | The raw data is processed batch by batch. We call the container of a data batch a window. The time interval between each two windows is greater than or equal to the `sessionGap` given by the user. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column wit [...]
+| `StateWindowAccessStrategy` | The raw data is processed batch by batch. We call the container of a data batch a window. In the state window, for text type or boolean type data, each value of the point in window is equal to the value of the first point in the window, and for numerical data, the distance between each value of the point in window and the value of the first point in the window is less than the threshold `delta` given by the user. The framework calls the `transform` method  [...]
 
 
 `RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters.
 
-
+The `SlidingTimeWindowAccessStrategy` is shown schematically below.
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/timeWindow.png">
 
 `SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them:
 
@@ -189,7 +191,8 @@ The relationship between the three types of parameters can be seen in the figure
 
 Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows.
 
-
+The `SlidingSizeWindowAccessStrategy` is shown schematically below.
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/countWindow.png">
 
 `SlidingSizeWindowAccessStrategy`:  `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them:
 
@@ -198,6 +201,23 @@ Note that the actual time interval of some of the last time windows may be less
 
 The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size.
 
+The `SessionTimeWindowAccessStrategy` is shown schematically below.
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/sessionWindow.png">
+
+`SessionTimeWindowAccessStrategy`: `SessionTimeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them:
+- Parameter 1: The display window on the time axis.
+- Parameter 2: The minimum time interval `sessionGap` of two adjacent windows.
+
+
+The `StateWindowAccessStrategy` is shown schematically below.
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/stateWindow.png">
+
+`StateWindowAccessStrategy` has four constructors.
+- Constructor 1: For numerical data, there are 3 parameters: the time axis can display the start and end time of the time window and the threshold `delta` for the allowable change within a single window.
+- Constructor 2: For text data and boolean data, there are 3 parameters: the time axis can be provided to display the start and end time of the time window. For both data types, the data within a single window is same, and there is no need to provide an allowable change threshold.
+- Constructor 3: For numerical data, there are 1 parameters: you can only provide the threshold delta that is allowed to change within a single window. The start time of the time axis display time window will be defined as the smallest timestamp in the entire query result set, and the time axis display time window end time will be defined as The largest timestamp in the entire query result set.
+- Constructor 4: For text data and boolean data, you can provide no parameter. The start and end timestamps are explained in Constructor 3.
+
 Please see the Javadoc for more details. 
 
 
diff --git a/docs/zh/UserGuide/Process-Data/UDF-User-Defined-Function.md b/docs/zh/UserGuide/Process-Data/UDF-User-Defined-Function.md
index c789b3a597..949627d567 100644
--- a/docs/zh/UserGuide/Process-Data/UDF-User-Defined-Function.md
+++ b/docs/zh/UserGuide/Process-Data/UDF-User-Defined-Function.md
@@ -141,14 +141,19 @@ void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) th
 
 下面是您可以设定的访问原始数据的策略:
 
-| 接口定义                          | 描述                                                         | 调用的`transform`方法                                        |
-| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
+| 接口定义                          | 描述                                                                                                                                                         | 调用的`transform`方法                                        |
+| :-------------------------------- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ |
 | `RowByRowAccessStrategy`          | 逐行地处理原始数据。框架会为每一行原始数据输入调用一次`transform`方法。当 UDF 只有一个输入序列时,一行输入就是该输入序列中的一个数据点。当 UDF 有多个输入序列时,一行输入序列对应的是这些输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(Row row, PointCollector collector) throws Exception` |
-| `SlidingTimeWindowAccessStrategy` | 以滑动时间窗口的方式处理原始数据。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
-| `SlidingSizeWindowAccessStrategy`    | 以固定行数的方式处理原始数据,即每个数据处理窗口都会包含固定行数的数据(最后一个窗口除外)。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
+| `SlidingTimeWindowAccessStrategy` | 以滑动时间窗口的方式处理原始数据。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。                                | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
+| `SlidingSizeWindowAccessStrategy`    | 以固定行数的方式处理原始数据,即每个数据处理窗口都会包含固定行数的数据(最后一个窗口除外)。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。   | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
+| `SessionTimeWindowAccessStrategy`    | 以会话窗口的方式处理原始数据,框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。                                  | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
+| `StateWindowAccessStrategy`    | 以状态窗口的方式处理原始数据,框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据。目前仅支持对一个物理量也就是一列数据进行开窗。                                                                       | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` |
 
 `RowByRowAccessStrategy`的构造不需要任何参数。
 
+如图是`SlidingTimeWindowAccessStrategy`的开窗示意图。
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/timeWindow.png">
+
 `SlidingTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 3 类参数:
 
 1. 时间轴显示时间窗开始和结束时间
@@ -165,6 +170,9 @@ void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) th
 
 注意,最后的一些时间窗口的实际时间间隔可能小于规定的时间间隔参数。另外,可能存在某些时间窗口内数据行数量为 0 的情况,这种情况框架也会为该窗口调用一次`transform`方法。
 
+如图是`SlidingSizeWindowAccessStrategy`的开窗示意图。
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/countWindow.png">
+
 `SlidingSizeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 个参数:
 
 1. 窗口大小,即一个数据处理窗口包含的数据行数。注意,最后一些窗口的数据行数可能少于规定的数据行数。
@@ -172,6 +180,22 @@ void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) th
 
 滑动步长参数不是必须的。当您不提供滑动步长参数时,滑动步长会被设定为窗口大小。
 
+如图是`SessionTimeWindowAccessStrategy`的开窗示意图。
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/sessionWindow.png">
+
+`SessionTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 类参数:
+1. 时间轴显示时间窗开始和结束时间。
+2. 会话窗口之间的最小时间间隔。
+   
+如图是`StateWindowAccessStrategy`的开窗示意图。
+<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://raw.githubusercontent.com/apache/iotdb-bin-resources/main/docs/UserGuide/Process-Data/UDF-User-Defined-Function/stateWindow.png">
+
+`StateWindowAccessStrategy`有四种构造方法。
+1. 针对数值型数据,可以提供时间轴显示时间窗开始和结束时间以及对于单个窗口内部允许变化的阈值delta。
+2. 针对文本数据以及布尔数据,可以提供时间轴显示时间窗开始和结束时间。对于这两种数据类型,单个窗口内的数据是相同的,不需要提供变化阈值。
+3. 针对数值型数据,可以只提供单个窗口内部允许变化的阈值delta,时间轴显示时间窗开始时间会被定义为整个查询结果集中最小的时间戳,时间轴显示时间窗结束时间会被定义为整个查询结果集中最大的时间戳。
+4. 针对文本数据以及布尔数据,可以不提供任何参数,开始与结束时间戳见3中解释。
+
 策略的构造方法详见 Javadoc。
 
  * setOutputDataType