You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "andywalner (via GitHub)" <gi...@apache.org> on 2023/04/23 17:52:25 UTC

[GitHub] [hudi] andywalner opened a new pull request, #8552: update dq docs

andywalner opened a new pull request, #8552:
URL: https://github.com/apache/hudi/pull/8552

   ### Change Logs
   
   Improved the data quality docs
   
   ### Impact
   
   N/A
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #8552: [DOCS] Update data quality docs

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua merged PR #8552:
URL: https://github.com/apache/hudi/pull/8552


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8552: [DOCS] Update data quality docs

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8552:
URL: https://github.com/apache/hudi/pull/8552#discussion_r1225597838


##########
website/docs/precommit_validator.md:
##########
@@ -17,13 +18,18 @@ spark.write.format("hudi")
 Today you can use any of these validators and even have the flexibility to extend your own:
 
 ## SQL Query Single Result
-Can be used to validate that a query on the table results in a specific value.
-- [org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java)
+[org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java)
+
+The SQL Query Single Result validator can be used to validate that a query on the table results in a specific value. This validator allows you to run a SQL command and abort the commit if it does not match the expected output.

Review Comment:
   ```suggestion
   The SQL Query Single Result validator can be used to validate that a query on the table results in a specific value. This validator allows you to run a SQL query and abort the commit if it does not match the expected output.
   ```



##########
website/docs/precommit_validator.md:
##########
@@ -17,13 +18,18 @@ spark.write.format("hudi")
 Today you can use any of these validators and even have the flexibility to extend your own:
 
 ## SQL Query Single Result
-Can be used to validate that a query on the table results in a specific value.
-- [org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java)
+[org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java)
+
+The SQL Query Single Result validator can be used to validate that a query on the table results in a specific value. This validator allows you to run a SQL command and abort the commit if it does not match the expected output.
 
-Multiple queries separated by ';' delimiter are supported.Expected result is included as part of query separated by '#'. Example query: `query1#result1;query2#result2`
+Multiple queries can be separated by `;` delimiter. Include the expected result as part of the query separated by `#`.
 
-Example, "expect exactly 0 null rows":
+Syntax: `query1#result1;query2#result2`
+
+Example:
 ```scala
+// In this example, we set up a validator that expects exactly 0 rows

Review Comment:
   ```suggestion
   // In this example, we set up a validator that expects there is no row with `col` column as `null`
   ```



##########
website/docs/precommit_validator.md:
##########
@@ -49,11 +63,14 @@ df.write.format("hudi").mode(Overwrite).
 ```
 
 ## SQL Query Inequality
-Can be used to validate for inequality of rows before and after the commit.
-- [org.apache.hudi.client.validator.SqlQueryInequalityPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java)
+[org.apache.hudi.client.validator.SqlQueryInequalityPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java)
 
-Example, "expect there must be a change of null rows with this commit":
+The SQL Query Inquality validator runs a query before ingesting the data, then runs the same query after ingesting the data and confirms that both outputs DO NOT match. This allows you to validate for differences of rows before and after the commit.

Review Comment:
   ```suggestion
   The SQL Query Inquality validator runs a query before ingesting the data, then runs the same query after ingesting the data and confirms that both outputs DO NOT match. This allows you to confirm changes in the rows before and after the commit.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org