You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/04/14 15:12:22 UTC

[GitHub] [dolphinscheduler] zixi0825 opened a new pull request, #9512: [Docs][DataQuality]: Add DataQuality Docs

zixi0825 opened a new pull request, #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512

   <!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
   
   
   ## Purpose of the pull request
   
   This pull request adds dataquality docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie commented on a diff in pull request #9512: [Docs][DataQuality]: Add DataQuality Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie commented on code in PR #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512#discussion_r854811336


##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties

Review Comment:
   ```suggestion
   Add config : `<server-name>/conf/common.properties`
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar

Review Comment:
   ```suggestion
   
   ```properties
   data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
   ```
   ```
   



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    - SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+- The SQL to calculate the total number of rows in the table is as follows:
+    - SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})

Review Comment:
   ```suggestion
        ```sql
        SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
        ```
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项

Review Comment:
   ```suggestion
   `Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
   ```
   ```suggestion
   `Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    - SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})

Review Comment:
   ```suggestion
   ```sql
   SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
   ```
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    - SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+- The SQL to calculate the total number of rows in the table is as follows:
+    - SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})

Review Comment:
   ```suggestion
        ```sql
        SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
        ```
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    - SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})

Review Comment:
   ```suggestion
       ```sql
       SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
       ```
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项

Review Comment:
   ```suggestion
   `Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
   ```
   ```suggestion
   `Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
   ```



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar

Review Comment:
   ```suggestion
   
   ```properties
   data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
   ```
   ```
   



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,299 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.## 1.2 注意事项
+
+Add config : common.properties
+> data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    - SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})

Review Comment:
   should use sql syntax



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie commented on pull request #9512: [Docs][DataQuality]: Add DataQuality Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie commented on PR #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512#issuecomment-1106157189

   @Tianqi-Dotes  Sorry, I merged it directly because we released 3.0.0-alpha, and this doc must in our website


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] Tianqi-Dotes commented on a diff in pull request #9512: [Docs][DataQuality]: Add DataQuality Docs

Posted by GitBox <gi...@apache.org>.

Tianqi-Dotes commented on code in PR #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512#discussion_r854879423


##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.

Review Comment:
   better add a blank line.



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`

Review Comment:
   `t_ds_dq_execute_result` table of `dolphinscheduler`
   ->
   table  `t_ds_dq_execute_result` of the database `dolphinscheduler`.



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。

Review Comment:
    ExpectedValue：FixValue=9。
   ->
    ExpectedValue：FixValue=9



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.

Review Comment:
   CheckMethod: [CheckFormula][Operator][Threshold]
   ->
   Check Formula:  [Check method][Operator][Threshold]



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    ```sql
+    SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+    ```
+- The SQL to calculate the total number of rows in the table is as follows:
+     ```sql
+     SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+     ```
+### 2.1.2 UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Src data type: select MySQL, PostgreSQL, etc.
+- Src data source: the corresponding data source under the source data type
+- Src data table: drop down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop down to select check column name
+- Check method:

Review Comment:
   Check method:
   ->
   Check methods:



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range

Review Comment:
   S



##########
docs/docs/zh/guide/task/data-quality.md:
##########
@@ -0,0 +1,313 @@
+# 概述
+## 任务类型介绍
+
+数据质量任务是用于检查数据在集成、处理过程中的数据准确性。本版本的数据质量任务包括单表检查、单表自定义SQL检查、多表准确性以及两表值比对。数据质量任务的运行环境为Spark2.4.0，其他版本尚未进行过验证，用户可自行验证。

Review Comment:
   多表准确性以及两表值比对
   ->maybe to
   多表准确性检查以及两表值比对检查
   



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 

Review Comment:
   The execution flow of
   ->
   The execution logic of
   or
   The execution code logic of



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide

Review Comment:
   add blank lines between titles



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,

Review Comment:
   ,
   ->
   .
   if these 2 sentence is the same sentence, should use ',' at the end of the first one ,and 'if you ....' lower case at the second one (When we write a long sentence, we always enter a newline at about 35 words)
   if these 2 are 2 sentence, should use '.' and the first sentence and upper case at the second sentence.



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview

Review Comment:
   better remove all the numbers in the title
   `markdown` doesn't need title number counters
   instead use `#` to separate different levels of titles.



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.

Review Comment:
   and then The result
   ->
   and then the result



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    ```sql
+    SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+    ```
+- The SQL to calculate the total number of rows in the table is as follows:
+     ```sql
+     SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+     ```
+### 2.1.2 UI Guide

Review Comment:
   add blank lines between titles



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail

Review Comment:
   better `` data and formula to help reading



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- eg
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# 2 Guide
+## 2.1 NullCheck
+### 2.1.1 Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+    ```sql
+    SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+    ```
+- The SQL to calculate the total number of rows in the table is as follows:
+     ```sql
+     SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+     ```
+### 2.1.2 UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Src data type: select MySQL, PostgreSQL, etc.
+- Src data source: the corresponding data source under the source data type
+- Src data table: drop down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop down to select check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison

Review Comment:
   The value used in the formula for comparison
   ->
   The value comparison for the formula output



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview

Review Comment:
   And additionally, please check all the ends of lines, to decide whether set a '.' or not.
   When you decide to use  '.' or not at the end of lines, please keep consistency throughout the whole document.



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## 1.3 Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=

Review Comment:
   use` , `to instead of `、`



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range
+- Time Format: Set the corresponding time format
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison

Review Comment:
   The value used 
   ->
   the value



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：

Review Comment:
   CheckMethod:



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range
+- Time Format: Set the corresponding time format
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Field Length Check
+### Introduction
+The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail
+### UI Guide
+![dataquality_length_check](/img/tasks/demo/field_length_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Logical operators: =, >, >=, <, <=, ! =
+- Field length limit: like the title
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Uniqueness Check
+### Introduction
+The goal of the uniqueness check is to check whether the field is duplicated. It is generally used to check whether the primary key is duplicated. If there is duplication and the threshold is reached, the check task will be judged to be failed.
+### UI Guide
+![dataquality_uniqueness_check](/img/tasks/demo/uniqueness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Regular Expression Check
+### Introduction
+The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
+### UI Guide
+![dataquality_regex_check](/img/tasks/demo/regexp_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- Regular expression: as title
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Enumeration Check
+### Introduction
+The goal of enumeration value verification is to check whether the value of a field is within the range of enumeration values. If there is data that is not in the range of enumeration values and exceeds the threshold, the task will be judged to fail
+### UI Guide
+![dataquality_enum_check](/img/tasks/demo/enumeration_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src table filter conditions: such as title, also used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- List of enumeration values: separated by commas
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+## Table Count Check
+### Introduction
+The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
+### UI Guide
+![dataquality_count_check](/img/tasks/demo/table_count_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Custom SQL Check
+### Introduction
+### UI Guide
+![dataquality_custom_sql_check](/img/tasks/demo/custom_sql_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the data to be verified is located
+- Actual value name: alias in SQL for statistical value calculation, such as max_num
+- Actual value calculation SQL: SQL for outputting actual values,
+    - Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
+    - select max(a) as max_num from ${src_table}, the table name must be filled like this
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Check method:

Review Comment:
   Check method:
   ->
   Check method: select a suitable check method.



##########
docs/docs/zh/guide/task/data-quality.md:
##########
@@ -0,0 +1,313 @@
+# 概述
+## 任务类型介绍
+
+数据质量任务是用于检查数据在集成、处理过程中的数据准确性。本版本的数据质量任务包括单表检查、单表自定义SQL检查、多表准确性以及两表值比对。数据质量任务的运行环境为Spark2.4.0，其他版本尚未进行过验证，用户可自行验证。
+
+- 数据质量任务的执行逻辑如下：
+
+> 用户在界面定义任务，用户输入值保存在`TaskParam`中
+运行任务时，`Master`会解析`TaskParam`，封装`DataQualityTask`所需要的参数下发至`Worker。
+Worker`运行数据质量任务，数据质量任务在运行结束之后将统计结果写入到指定的存储引擎中，当前数据质量任务结果存储在`dolphinscheduler`的`t_ds_dq_execute_result`表中
+`Worker`发送任务结果给`Master`，`Master`收到`TaskResponse`之后会判断任务类型是否为`DataQualityTask`，如果是的话会根据`taskInstanceId`从`t_ds_dq_execute_result`中读取相应的结果，然后根据用户配置好的检查方式，操作符和阈值进行结果判断，如果结果为失败的话，会根据用户配置好的的失败策略进行相应的操作，告警或者中断

Review Comment:
   然后根据用户配置好的检查方式，
   ->
   然后根据用户配置好的校验方式，



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range

Review Comment:
   use upper case at the start



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,305 @@
+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.

Review Comment:
   check mode
   ->
   check formula



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range
+- Time Format: Set the corresponding time format

Review Comment:
   Time Format: Set the corresponding time format
   ->
   Time Format: set the corresponding time format



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range
+- Time Format: Set the corresponding time format
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Field Length Check
+### Introduction
+The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail
+### UI Guide
+![dataquality_length_check](/img/tasks/demo/field_length_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Logical operators: =, >, >=, <, <=, ! =
+- Field length limit: like the title
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Uniqueness Check
+### Introduction
+The goal of the uniqueness check is to check whether the field is duplicated. It is generally used to check whether the primary key is duplicated. If there is duplication and the threshold is reached, the check task will be judged to be failed.
+### UI Guide
+![dataquality_uniqueness_check](/img/tasks/demo/uniqueness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Regular Expression Check
+### Introduction
+The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
+### UI Guide
+![dataquality_regex_check](/img/tasks/demo/regexp_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- Regular expression: as title
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Enumeration Check
+### Introduction
+The goal of enumeration value verification is to check whether the value of a field is within the range of enumeration values. If there is data that is not in the range of enumeration values and exceeds the threshold, the task will be judged to fail
+### UI Guide
+![dataquality_enum_check](/img/tasks/demo/enumeration_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src table filter conditions: such as title, also used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- List of enumeration values: separated by commas
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+## Table Count Check
+### Introduction
+The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
+### UI Guide
+![dataquality_count_check](/img/tasks/demo/table_count_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Custom SQL Check
+### Introduction
+### UI Guide
+![dataquality_custom_sql_check](/img/tasks/demo/custom_sql_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the data to be verified is located
+- Actual value name: alias in SQL for statistical value calculation, such as max_num
+- Actual value calculation SQL: SQL for outputting actual values,
+    - Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
+    - select max(a) as max_num from ${src_table}, the table name must be filled like this
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Check method:
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent

Review Comment:
   what's the difference between alarm and alert



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range

Review Comment:
   use upper case at the start



##########
docs/docs/en/guide/task/data-quality.md:
##########
@@ -0,0 +1,310 @@
+# Overview
+## Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 
+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
+
+Add config : `<server-name>/conf/common.properties`
+
+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+```
+
+Please fill in `data-quality.jar.name` according to the actual package name,
+If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
+If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
+If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
+Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
+`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
+
+## Detail
+
+- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
+- CheckFormula：
+    - Expected-Actual
+    - Actual-Expected
+    - (Actual/Expected)x100%
+    - (Expected-Actual)/Expected x100%
+- Operator：=、>、>=、<、<=、!=
+- ExpectedValue
+    - FixValue
+    - DailyAvg
+    - WeeklyAvg
+    - MonthlyAvg
+    - Last7DayAvg
+    - Last30DayAvg
+    - SrcTableTotalRows
+    - TargetTableTotalRows
+    
+- example
+    - CheckFormula：Expected-Actual
+    - Operator：>
+    - Threshold：0
+    - ExpectedValue：FixValue=9。
+    
+Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
+# Guide
+## NullCheck
+### Introduction
+The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
+- Calculate the SQL statement that the specified column is empty as follows:
+
+  ```sql
+  SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
+  ```
+
+- The SQL to calculate the total number of rows in the table is as follows:
+
+  ```sql
+  SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
+  ```
+
+### UI Guide
+![dataquality_null_check](/img/tasks/demo/null_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select the check column name
+- Check method:
+    - [Expected-Actual]
+    - [Actual-Expected]
+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
+- Expected value type: select the desired type from the drop-down menu
+
+## Timeliness Check
+### Introduction
+The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
+### UI Guide
+![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
+- Source data type: select MySQL, PostgreSQL, etc.
+- Source data source: the corresponding data source under the source data type
+- Source data table: drop-down to select the table where the validation data is located
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range

Review Comment:
   E



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie commented on pull request #9512: [Docs][DataQuality]: Add DataQuality Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie commented on PR #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512#issuecomment-1099875910

   Will take a look this weekend


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zhongjiajie merged pull request #9512: [Docs][DataQuality]: Add DataQuality Docs

Posted by GitBox <gi...@apache.org>.

zhongjiajie merged PR #9512:
URL: https://github.com/apache/dolphinscheduler/pull/9512


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org