You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/06 16:34:52 UTC

[GitHub] [airflow] denimalpaca commented on a diff in pull request #23915: Add new SQLCheckOperators

denimalpaca commented on code in PR #23915:
URL: https://github.com/apache/airflow/pull/23915#discussion_r890329827


##########
airflow/operators/sql.py:
##########
@@ -467,6 +467,269 @@ def push(self, meta_data):
         self.log.info("Log from %s:\n%s", self.dag_id, info)
 
 
+def _get_failed_tests(checks):
+    return [
+        f"\tCheck: {check}, " f"Check Values: {check_values}\n"
+        for check, check_values in checks.items()
+        if not check_values["success"]
+    ]
+
+
+class SQLColumnCheckOperator(BaseSQLOperator):
+    """
+    Performs one or more of the templated checks in the column_checks dictionary.
+    Checks are performed on a per-column basis specified by the column_mapping.
+
+    :param table: the table to run checks on.
+    :param column_mapping: the dictionary of columns and their associated checks, e.g.:
+    {
+        'col_name': {
+            'null_check': {
+                'equal_to': 0,
+            },
+            'min': {
+                'greater_than': 5,
+                'leq_than': 10,
+                'tolerance': 0.2,
+            },
+            'max': {
+                'less_than': 1000,
+                'geq_than': 10,
+                'tolerance': 0.01
+            }
+        }
+    }
+    :param conn_id: the connection ID used to connect to the database.
+    :param database: name of database which overwrite the defined one in connection
+    """
+
+    column_checks = {
+        # pass value should be number of acceptable nulls
+        "null_check": "SUM(CASE WHEN 'column' IS NULL THEN 1 ELSE 0 END) AS column_null_check",

Review Comment:
   Ummm... good question. I'm not 100% sure. Testing on Snowflake, the lack of `'` version had a parse error iirc. So it might be a dialect issue that can be resolved in the provider versions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org