You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by ja...@apache.org on 2022/01/02 14:18:27 UTC

[iotdb] branch master updated: [IOTDB-2241] Library-UDF Data Repairing Documents (#4696)

This is an automated email from the ASF dual-hosted git repository.

jackietien pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git


The following commit(s) were added to refs/heads/master by this push:
     new 86a6fcd  [IOTDB-2241] Library-UDF Data Repairing Documents (#4696)
86a6fcd is described below

commit 86a6fcd5e44d86b82cbede35b7eeda2a0069dfdc
Author: Pengyu Chen <48...@users.noreply.github.com>
AuthorDate: Sun Jan 2 22:17:53 2022 +0800

    [IOTDB-2241] Library-UDF Data Repairing Documents (#4696)
---
 docs/UserGuide/Library-UDF/Data-Repair.md    | 349 +++++++++++++++++++++++++++
 docs/zh/UserGuide/Library-UDF/Data-Repair.md | 341 ++++++++++++++++++++++++++
 site/src/main/.vuepress/config.js            |   6 +-
 3 files changed, 694 insertions(+), 2 deletions(-)

diff --git a/docs/UserGuide/Library-UDF/Data-Repair.md b/docs/UserGuide/Library-UDF/Data-Repair.md
new file mode 100644
index 0000000..2848fee
--- /dev/null
+++ b/docs/UserGuide/Library-UDF/Data-Repair.md
@@ -0,0 +1,349 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# TimestampRepair
+
+This function is used for timestamp repair.
+According to the given standard time interval,
+the method of minimizing the repair cost is adopted.
+By fine-tuning the timestamps,
+the original data with unstable timestamp interval is repaired to strictly equispaced data.
+If no standard time interval is given,
+this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval.
+
+**Name:** TIMESTAMPREPAIR
+
+**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.
+
+**Parameters:**
+
++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method.
++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used.
+
+**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing.
+
+## Examples
+
+### Manually Specify the Standard Time Interval
+
+When `interval` is given, this function repairs according to the given standard time interval.
+
+Input series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2021-07-01T12:00:00.000+08:00|            1.0|
+|2021-07-01T12:00:10.000+08:00|            2.0|
+|2021-07-01T12:00:19.000+08:00|            3.0|
+|2021-07-01T12:00:30.000+08:00|            4.0|
+|2021-07-01T12:00:40.000+08:00|            5.0|
+|2021-07-01T12:00:50.000+08:00|            6.0|
+|2021-07-01T12:01:01.000+08:00|            7.0|
+|2021-07-01T12:01:11.000+08:00|            8.0|
+|2021-07-01T12:01:21.000+08:00|            9.0|
+|2021-07-01T12:01:31.000+08:00|           10.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select timestamprepair(s1,'interval'='10000') from root.test.d2
+```
+
+Output series:
+
+
+```
++-----------------------------+----------------------------------------------------+
+|                         Time|timestamprepair(root.test.d2.s1, "interval"="10000")|
++-----------------------------+----------------------------------------------------+
+|2021-07-01T12:00:00.000+08:00|                                                 1.0|
+|2021-07-01T12:00:10.000+08:00|                                                 2.0|
+|2021-07-01T12:00:20.000+08:00|                                                 3.0|
+|2021-07-01T12:00:30.000+08:00|                                                 4.0|
+|2021-07-01T12:00:40.000+08:00|                                                 5.0|
+|2021-07-01T12:00:50.000+08:00|                                                 6.0|
+|2021-07-01T12:01:00.000+08:00|                                                 7.0|
+|2021-07-01T12:01:10.000+08:00|                                                 8.0|
+|2021-07-01T12:01:20.000+08:00|                                                 9.0|
+|2021-07-01T12:01:30.000+08:00|                                                10.0|
++-----------------------------+----------------------------------------------------+
+```
+
+### Automatically Estimate the Standard Time Interval
+
+When `interval` is default, this function estimates the standard time interval.
+
+Input series is the same as above, the SQL for query is shown below:
+
+```sql
+select timestamprepair(s1) from root.test.d2
+```
+
+Output series:
+
+```
++-----------------------------+--------------------------------+
+|                         Time|timestamprepair(root.test.d2.s1)|
++-----------------------------+--------------------------------+
+|2021-07-01T12:00:00.000+08:00|                             1.0|
+|2021-07-01T12:00:10.000+08:00|                             2.0|
+|2021-07-01T12:00:20.000+08:00|                             3.0|
+|2021-07-01T12:00:30.000+08:00|                             4.0|
+|2021-07-01T12:00:40.000+08:00|                             5.0|
+|2021-07-01T12:00:50.000+08:00|                             6.0|
+|2021-07-01T12:01:00.000+08:00|                             7.0|
+|2021-07-01T12:01:10.000+08:00|                             8.0|
+|2021-07-01T12:01:20.000+08:00|                             9.0|
+|2021-07-01T12:01:30.000+08:00|                            10.0|
++-----------------------------+--------------------------------+
+```
+# ValueFill
+
+## Usage
+This function is used to impute time series. Several methods are supported.
+
+**Name**: ValueFill
+**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
+
+**Parameters:**
+
++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear".
+  Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint.
+
+**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing.
+
+**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0).
+
+## Examples
+
+### Fill with linear
+
+When `method` is "linear" or the default, Screen method is used to impute.
+
+Input series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2020-01-01T00:00:02.000+08:00|            NaN|
+|2020-01-01T00:00:03.000+08:00|          101.0|
+|2020-01-01T00:00:04.000+08:00|          102.0|
+|2020-01-01T00:00:06.000+08:00|          104.0|
+|2020-01-01T00:00:08.000+08:00|          126.0|
+|2020-01-01T00:00:10.000+08:00|          108.0|
+|2020-01-01T00:00:14.000+08:00|            NaN|
+|2020-01-01T00:00:15.000+08:00|          113.0|
+|2020-01-01T00:00:16.000+08:00|          114.0|
+|2020-01-01T00:00:18.000+08:00|          116.0|
+|2020-01-01T00:00:20.000+08:00|            NaN|
+|2020-01-01T00:00:22.000+08:00|            NaN|
+|2020-01-01T00:00:26.000+08:00|          124.0|
+|2020-01-01T00:00:28.000+08:00|          126.0|
+|2020-01-01T00:00:30.000+08:00|          128.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select valuefill(s1) from root.test.d2
+```
+
+Output series:
+
+```
++-----------------------------+-----------------------+
+|                         Time|valuefill(root.test.d2)|
++-----------------------------+-----------------------+
+|2020-01-01T00:00:02.000+08:00|                    NaN|
+|2020-01-01T00:00:03.000+08:00|                  101.0|
+|2020-01-01T00:00:04.000+08:00|                  102.0|
+|2020-01-01T00:00:06.000+08:00|                  104.0|
+|2020-01-01T00:00:08.000+08:00|                  126.0|
+|2020-01-01T00:00:10.000+08:00|                  108.0|
+|2020-01-01T00:00:14.000+08:00|                  108.0|
+|2020-01-01T00:00:15.000+08:00|                  113.0|
+|2020-01-01T00:00:16.000+08:00|                  114.0|
+|2020-01-01T00:00:18.000+08:00|                  116.0|
+|2020-01-01T00:00:20.000+08:00|                  118.7|
+|2020-01-01T00:00:22.000+08:00|                  121.3|
+|2020-01-01T00:00:26.000+08:00|                  124.0|
+|2020-01-01T00:00:28.000+08:00|                  126.0|
+|2020-01-01T00:00:30.000+08:00|                  128.0|
++-----------------------------+-----------------------+
+```
+
+### Previous Fill
+
+When `method` is "previous", previous method is used.
+
+Input series is the same as above, the SQL for query is shown below:
+
+```sql
+select valuefill(s1,"method"="previous") from root.test.d2
+```
+
+Output series:
+
+```
++-----------------------------+-------------------------------------------+
+|                         Time|valuefill(root.test.d2,"method"="previous")|
++-----------------------------+-------------------------------------------+
+|2020-01-01T00:00:02.000+08:00|                                        NaN|
+|2020-01-01T00:00:03.000+08:00|                                      101.0|
+|2020-01-01T00:00:04.000+08:00|                                      102.0|
+|2020-01-01T00:00:06.000+08:00|                                      104.0|
+|2020-01-01T00:00:08.000+08:00|                                      126.0|
+|2020-01-01T00:00:10.000+08:00|                                      108.0|
+|2020-01-01T00:00:14.000+08:00|                                      110.5|
+|2020-01-01T00:00:15.000+08:00|                                      113.0|
+|2020-01-01T00:00:16.000+08:00|                                      114.0|
+|2020-01-01T00:00:18.000+08:00|                                      116.0|
+|2020-01-01T00:00:20.000+08:00|                                      116.0|
+|2020-01-01T00:00:22.000+08:00|                                      116.0|
+|2020-01-01T00:00:26.000+08:00|                                      124.0|
+|2020-01-01T00:00:28.000+08:00|                                      126.0|
+|2020-01-01T00:00:30.000+08:00|                                      128.0|
++-----------------------------+-------------------------------------------+
+```
+
+# ValueRepair
+
+## Usage
+This function is used to repair the value of the time series.
+Currently, two methods are supported:
+**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes;
+**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood.
+
+
+**Name:** VALUEREPAIR
+
+**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
+
+**Parameters:**
+
++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used.
++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation.
++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation.
++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0.
++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation.
+
+**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing.
+
+**Note:** `NaN` will be filled with linear interpolation before repairing.
+
+## Examples
+
+### Repair with Screen
+
+When `method` is 'Screen' or the default, Screen method is used.
+
+Input series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2020-01-01T00:00:02.000+08:00|          100.0|
+|2020-01-01T00:00:03.000+08:00|          101.0|
+|2020-01-01T00:00:04.000+08:00|          102.0|
+|2020-01-01T00:00:06.000+08:00|          104.0|
+|2020-01-01T00:00:08.000+08:00|          126.0|
+|2020-01-01T00:00:10.000+08:00|          108.0|
+|2020-01-01T00:00:14.000+08:00|          112.0|
+|2020-01-01T00:00:15.000+08:00|          113.0|
+|2020-01-01T00:00:16.000+08:00|          114.0|
+|2020-01-01T00:00:18.000+08:00|          116.0|
+|2020-01-01T00:00:20.000+08:00|          118.0|
+|2020-01-01T00:00:22.000+08:00|          100.0|
+|2020-01-01T00:00:26.000+08:00|          124.0|
+|2020-01-01T00:00:28.000+08:00|          126.0|
+|2020-01-01T00:00:30.000+08:00|            NaN|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select valuerepair(s1) from root.test.d2
+```
+
+Output series:
+
+```
++-----------------------------+----------------------------+
+|                         Time|valuerepair(root.test.d2.s1)|
++-----------------------------+----------------------------+
+|2020-01-01T00:00:02.000+08:00|                       100.0|
+|2020-01-01T00:00:03.000+08:00|                       101.0|
+|2020-01-01T00:00:04.000+08:00|                       102.0|
+|2020-01-01T00:00:06.000+08:00|                       104.0|
+|2020-01-01T00:00:08.000+08:00|                       106.0|
+|2020-01-01T00:00:10.000+08:00|                       108.0|
+|2020-01-01T00:00:14.000+08:00|                       112.0|
+|2020-01-01T00:00:15.000+08:00|                       113.0|
+|2020-01-01T00:00:16.000+08:00|                       114.0|
+|2020-01-01T00:00:18.000+08:00|                       116.0|
+|2020-01-01T00:00:20.000+08:00|                       118.0|
+|2020-01-01T00:00:22.000+08:00|                       120.0|
+|2020-01-01T00:00:26.000+08:00|                       124.0|
+|2020-01-01T00:00:28.000+08:00|                       126.0|
+|2020-01-01T00:00:30.000+08:00|                       128.0|
++-----------------------------+----------------------------+
+```
+
+### Repair with LsGreedy
+When `method` is 'LsGreedy', LsGreedy method is used.
+
+Input series is the same as above, the SQL for query is shown below:
+
+```sql
+select valuerepair(s1,'method'='LsGreedy') from root.test.d2
+```
+
+Output series:
+
+```
++-----------------------------+-------------------------------------------------+
+|                         Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")|
++-----------------------------+-------------------------------------------------+
+|2020-01-01T00:00:02.000+08:00|                                            100.0|
+|2020-01-01T00:00:03.000+08:00|                                            101.0|
+|2020-01-01T00:00:04.000+08:00|                                            102.0|
+|2020-01-01T00:00:06.000+08:00|                                            104.0|
+|2020-01-01T00:00:08.000+08:00|                                            106.0|
+|2020-01-01T00:00:10.000+08:00|                                            108.0|
+|2020-01-01T00:00:14.000+08:00|                                            112.0|
+|2020-01-01T00:00:15.000+08:00|                                            113.0|
+|2020-01-01T00:00:16.000+08:00|                                            114.0|
+|2020-01-01T00:00:18.000+08:00|                                            116.0|
+|2020-01-01T00:00:20.000+08:00|                                            118.0|
+|2020-01-01T00:00:22.000+08:00|                                            120.0|
+|2020-01-01T00:00:26.000+08:00|                                            124.0|
+|2020-01-01T00:00:28.000+08:00|                                            126.0|
+|2020-01-01T00:00:30.000+08:00|                                            128.0|
++-----------------------------+-------------------------------------------------+
+```
diff --git a/docs/zh/UserGuide/Library-UDF/Data-Repair.md b/docs/zh/UserGuide/Library-UDF/Data-Repair.md
new file mode 100644
index 0000000..2ea3173
--- /dev/null
+++ b/docs/zh/UserGuide/Library-UDF/Data-Repair.md
@@ -0,0 +1,341 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# TimestampRepair
+
+## 函数简介
+
+本函数用于时间戳修复。根据给定的标准时间间隔,采用最小化修复代价的方法,通过对数据时间戳的微调,将原本时间戳间隔不稳定的数据修复为严格等间隔的数据。在未给定标准时间间隔的情况下,本函数将使用时间间隔的中位数 (median)、众数 (mode) 或聚类中心 (cluster) 来推算标准时间间隔。
+
+
+**函数名:** TIMESTAMPREPAIR
+
+**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE
+
+**参数:**
+
++ `interval`: 标准时间间隔(单位是毫秒),是一个正整数。在缺省情况下,将根据指定的方法推算。
++ `method`:推算标准时间间隔的方法,取值为 'median', 'mode' 或 'cluster',仅在`interval`缺省时有效。在缺省情况下,将使用中位数方法进行推算。
+
+**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。
+
+## 使用示例
+
+### 指定标准时间间隔
+
+在给定`interval`参数的情况下,本函数将按照指定的标准时间间隔进行修复。
+
+输入序列:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2021-07-01T12:00:00.000+08:00|            1.0|
+|2021-07-01T12:00:10.000+08:00|            2.0|
+|2021-07-01T12:00:19.000+08:00|            3.0|
+|2021-07-01T12:00:30.000+08:00|            4.0|
+|2021-07-01T12:00:40.000+08:00|            5.0|
+|2021-07-01T12:00:50.000+08:00|            6.0|
+|2021-07-01T12:01:01.000+08:00|            7.0|
+|2021-07-01T12:01:11.000+08:00|            8.0|
+|2021-07-01T12:01:21.000+08:00|            9.0|
+|2021-07-01T12:01:31.000+08:00|           10.0|
++-----------------------------+---------------+
+```
+
+用于查询的SQL语句:
+
+```sql
+select timestamprepair(s1,'interval'='10000') from root.test.d2
+```
+
+输出序列:
+
+```
++-----------------------------+----------------------------------------------------+
+|                         Time|timestamprepair(root.test.d2.s1, "interval"="10000")|
++-----------------------------+----------------------------------------------------+
+|2021-07-01T12:00:00.000+08:00|                                                 1.0|
+|2021-07-01T12:00:10.000+08:00|                                                 2.0|
+|2021-07-01T12:00:20.000+08:00|                                                 3.0|
+|2021-07-01T12:00:30.000+08:00|                                                 4.0|
+|2021-07-01T12:00:40.000+08:00|                                                 5.0|
+|2021-07-01T12:00:50.000+08:00|                                                 6.0|
+|2021-07-01T12:01:00.000+08:00|                                                 7.0|
+|2021-07-01T12:01:10.000+08:00|                                                 8.0|
+|2021-07-01T12:01:20.000+08:00|                                                 9.0|
+|2021-07-01T12:01:30.000+08:00|                                                10.0|
++-----------------------------+----------------------------------------------------+
+```
+
+### 自动推算标准时间间隔
+
+如果`interval`参数没有给定,本函数将按照推算的标准时间间隔进行修复。
+
+输入序列同上,用于查询的 SQL 语句如下:
+
+```sql
+select timestamprepair(s1) from root.test.d2
+```
+
+输出序列:
+
+```
++-----------------------------+--------------------------------+
+|                         Time|timestamprepair(root.test.d2.s1)|
++-----------------------------+--------------------------------+
+|2021-07-01T12:00:00.000+08:00|                             1.0|
+|2021-07-01T12:00:10.000+08:00|                             2.0|
+|2021-07-01T12:00:20.000+08:00|                             3.0|
+|2021-07-01T12:00:30.000+08:00|                             4.0|
+|2021-07-01T12:00:40.000+08:00|                             5.0|
+|2021-07-01T12:00:50.000+08:00|                             6.0|
+|2021-07-01T12:01:00.000+08:00|                             7.0|
+|2021-07-01T12:01:10.000+08:00|                             8.0|
+|2021-07-01T12:01:20.000+08:00|                             9.0|
+|2021-07-01T12:01:30.000+08:00|                            10.0|
++-----------------------------+--------------------------------+
+```
+
+# ValueFill
+
+## 函数简介
+
+**函数名:** ValueFill
+
+**输入序列:** 单列时序数据,类型为INT32 / INT64 / FLOAT / DOUBLE
+
+**参数:**
+
++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, 默认为 "linear"。其中,“mean” 指使用均值填补的方法; “previous" 指使用前值填补方法;“linear" 指使用线性插值填补方法;“likelihood” 为基于速度的正态分布的极大似然估计方法;“AR” 指自回归的填补方法;“MA” 指滑动平均的填补方法;"SCREEN" 指约束填补方法;缺省情况下使用 “linear”。
+
+**输出序列:** 填补后的单维序列。
+
+**备注:** AR 模型采用 AR(1),时序列需满足自相关条件,否则将输出单个数据点 (0, 0.0).
+
+## 使用示例
+### 使用 linear 方法进行填补
+当`method`缺省或取值为 'linear' 时,本函数将使用线性插值方法进行填补。
+
+输入序列:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2020-01-01T00:00:02.000+08:00|            NaN|
+|2020-01-01T00:00:03.000+08:00|          101.0|
+|2020-01-01T00:00:04.000+08:00|          102.0|
+|2020-01-01T00:00:06.000+08:00|          104.0|
+|2020-01-01T00:00:08.000+08:00|          126.0|
+|2020-01-01T00:00:10.000+08:00|          108.0|
+|2020-01-01T00:00:14.000+08:00|            NaN|
+|2020-01-01T00:00:15.000+08:00|          113.0|
+|2020-01-01T00:00:16.000+08:00|          114.0|
+|2020-01-01T00:00:18.000+08:00|          116.0|
+|2020-01-01T00:00:20.000+08:00|            NaN|
+|2020-01-01T00:00:22.000+08:00|            NaN|
+|2020-01-01T00:00:26.000+08:00|          124.0|
+|2020-01-01T00:00:28.000+08:00|          126.0|
+|2020-01-01T00:00:30.000+08:00|          128.0|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句:
+
+```sql
+select valuefill(s1) from root.test.d2
+```
+
+输出序列:
+
+
+
+```
++-----------------------------+-----------------------+
+|                         Time|valuefill(root.test.d2)|
++-----------------------------+-----------------------+
+|2020-01-01T00:00:02.000+08:00|                    NaN|
+|2020-01-01T00:00:03.000+08:00|                  101.0|
+|2020-01-01T00:00:04.000+08:00|                  102.0|
+|2020-01-01T00:00:06.000+08:00|                  104.0|
+|2020-01-01T00:00:08.000+08:00|                  126.0|
+|2020-01-01T00:00:10.000+08:00|                  108.0|
+|2020-01-01T00:00:14.000+08:00|                  108.0|
+|2020-01-01T00:00:15.000+08:00|                  113.0|
+|2020-01-01T00:00:16.000+08:00|                  114.0|
+|2020-01-01T00:00:18.000+08:00|                  116.0|
+|2020-01-01T00:00:20.000+08:00|                  118.7|
+|2020-01-01T00:00:22.000+08:00|                  121.3|
+|2020-01-01T00:00:26.000+08:00|                  124.0|
+|2020-01-01T00:00:28.000+08:00|                  126.0|
+|2020-01-01T00:00:30.000+08:00|                  128.0|
++-----------------------------+-----------------------+
+```
+
+
+
+### 使用 previous 方法进行填补
+
+当`method`取值为 'previous' 时,本函数将使前值填补方法进行数值填补。
+
+输入序列同上,用于查询的 SQL 语句如下:
+
+```sql
+select valuefill(s1,"method"="previous") from root.test.d2
+```
+
+输出序列:
+
+```
++-----------------------------+-------------------------------------------+
+|                         Time|valuefill(root.test.d2,"method"="previous")|
++-----------------------------+-------------------------------------------+
+|2020-01-01T00:00:02.000+08:00|                                        NaN|
+|2020-01-01T00:00:03.000+08:00|                                      101.0|
+|2020-01-01T00:00:04.000+08:00|                                      102.0|
+|2020-01-01T00:00:06.000+08:00|                                      104.0|
+|2020-01-01T00:00:08.000+08:00|                                      126.0|
+|2020-01-01T00:00:10.000+08:00|                                      108.0|
+|2020-01-01T00:00:14.000+08:00|                                      110.5|
+|2020-01-01T00:00:15.000+08:00|                                      113.0|
+|2020-01-01T00:00:16.000+08:00|                                      114.0|
+|2020-01-01T00:00:18.000+08:00|                                      116.0|
+|2020-01-01T00:00:20.000+08:00|                                      116.0|
+|2020-01-01T00:00:22.000+08:00|                                      116.0|
+|2020-01-01T00:00:26.000+08:00|                                      124.0|
+|2020-01-01T00:00:28.000+08:00|                                      126.0|
+|2020-01-01T00:00:30.000+08:00|                                      128.0|
++-----------------------------+-------------------------------------------+
+```
+
+# ValueRepair
+
+## 函数简介
+本函数用于对时间序列的数值进行修复。目前,本函数支持两种修复方法:**Screen** 是一种基于速度阈值的方法,在最小改动的前提下使得所有的速度符合阈值要求;**LsGreedy** 是一种基于速度变化似然的方法,将速度变化建模为高斯分布,并采用贪心算法极大化似然函数。
+
+**函数名:** VALUEREPAIR
+
+**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
+
+**参数:**
+
++ `method`:修复时采用的方法,取值为 'Screen' 或 'LsGreedy'. 在缺省情况下,使用 Screen 方法进行修复。
++ `minSpeed`:该参数仅在使用 Screen 方法时有效。当速度小于该值时会被视作数值异常点加以修复。在缺省情况下为中位数减去三倍绝对中位差。
++ `maxSpeed`:该参数仅在使用 Screen 方法时有效。当速度大于该值时会被视作数值异常点加以修复。在缺省情况下为中位数加上三倍绝对中位差。
++ `center`:该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的中心。在缺省情况下为 0。
++ `sigma` :该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的标准差。在缺省情况下为绝对中位差。
+
+**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。
+
+**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。
+
+## 使用示例
+### 使用 Screen 方法进行修复
+当`method`缺省或取值为 'Screen' 时,本函数将使用 Screen 方法进行数值修复。
+
+输入序列:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d2.s1|
++-----------------------------+---------------+
+|2020-01-01T00:00:02.000+08:00|          100.0|
+|2020-01-01T00:00:03.000+08:00|          101.0|
+|2020-01-01T00:00:04.000+08:00|          102.0|
+|2020-01-01T00:00:06.000+08:00|          104.0|
+|2020-01-01T00:00:08.000+08:00|          126.0|
+|2020-01-01T00:00:10.000+08:00|          108.0|
+|2020-01-01T00:00:14.000+08:00|          112.0|
+|2020-01-01T00:00:15.000+08:00|          113.0|
+|2020-01-01T00:00:16.000+08:00|          114.0|
+|2020-01-01T00:00:18.000+08:00|          116.0|
+|2020-01-01T00:00:20.000+08:00|          118.0|
+|2020-01-01T00:00:22.000+08:00|          100.0|
+|2020-01-01T00:00:26.000+08:00|          124.0|
+|2020-01-01T00:00:28.000+08:00|          126.0|
+|2020-01-01T00:00:30.000+08:00|            NaN|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句:
+
+```sql
+select valuerepair(s1) from root.test.d2
+```
+
+输出序列:
+
+```
++-----------------------------+----------------------------+
+|                         Time|valuerepair(root.test.d2.s1)|
++-----------------------------+----------------------------+
+|2020-01-01T00:00:02.000+08:00|                       100.0|
+|2020-01-01T00:00:03.000+08:00|                       101.0|
+|2020-01-01T00:00:04.000+08:00|                       102.0|
+|2020-01-01T00:00:06.000+08:00|                       104.0|
+|2020-01-01T00:00:08.000+08:00|                       106.0|
+|2020-01-01T00:00:10.000+08:00|                       108.0|
+|2020-01-01T00:00:14.000+08:00|                       112.0|
+|2020-01-01T00:00:15.000+08:00|                       113.0|
+|2020-01-01T00:00:16.000+08:00|                       114.0|
+|2020-01-01T00:00:18.000+08:00|                       116.0|
+|2020-01-01T00:00:20.000+08:00|                       118.0|
+|2020-01-01T00:00:22.000+08:00|                       120.0|
+|2020-01-01T00:00:26.000+08:00|                       124.0|
+|2020-01-01T00:00:28.000+08:00|                       126.0|
+|2020-01-01T00:00:30.000+08:00|                       128.0|
++-----------------------------+----------------------------+
+```
+
+### 使用 LsGreedy 方法进行修复
+当`method`取值为 'LsGreedy' 时,本函数将使用 LsGreedy 方法进行数值修复。
+
+输入序列同上,用于查询的 SQL 语句如下:
+
+```sql
+select valuerepair(s1,'method'='LsGreedy') from root.test.d2
+```
+
+输出序列:
+
+```
++-----------------------------+-------------------------------------------------+
+|                         Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")|
++-----------------------------+-------------------------------------------------+
+|2020-01-01T00:00:02.000+08:00|                                            100.0|
+|2020-01-01T00:00:03.000+08:00|                                            101.0|
+|2020-01-01T00:00:04.000+08:00|                                            102.0|
+|2020-01-01T00:00:06.000+08:00|                                            104.0|
+|2020-01-01T00:00:08.000+08:00|                                            106.0|
+|2020-01-01T00:00:10.000+08:00|                                            108.0|
+|2020-01-01T00:00:14.000+08:00|                                            112.0|
+|2020-01-01T00:00:15.000+08:00|                                            113.0|
+|2020-01-01T00:00:16.000+08:00|                                            114.0|
+|2020-01-01T00:00:18.000+08:00|                                            116.0|
+|2020-01-01T00:00:20.000+08:00|                                            118.0|
+|2020-01-01T00:00:22.000+08:00|                                            120.0|
+|2020-01-01T00:00:26.000+08:00|                                            124.0|
+|2020-01-01T00:00:28.000+08:00|                                            126.0|
+|2020-01-01T00:00:30.000+08:00|                                            128.0|
++-----------------------------+-------------------------------------------------+
+```
diff --git a/site/src/main/.vuepress/config.js b/site/src/main/.vuepress/config.js
index 08f28af..8d644ce 100644
--- a/site/src/main/.vuepress/config.js
+++ b/site/src/main/.vuepress/config.js
@@ -724,7 +724,8 @@ var config = {
 					    title: 'UDF Library',
 					    children: [
 					        ['Library-UDF/Get-Started', 'Get Started'],
-					        ['Library-UDF/Data-Quality', 'Data Quality']
+					        ['Library-UDF/Data-Quality', 'Data Quality'],
+					        ['Library-UDF/Data-Repair', 'Data Repairing']
 					    ]
 					},
 					{
@@ -1531,7 +1532,8 @@ var config = {
           				title: 'UDF 函数库',
           				children: [
           					['Library-UDF/Get-Started', '快速上手'],
-          					['Library-UDF/Data-Quality', '数据质量']
+          					['Library-UDF/Data-Quality', '数据质量'],
+          					['Library-UDF/Data-Repair', '数据修复']
           				]
 					},
 					{