You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/19 18:18:02 UTC

[GitHub] [airflow] denimalpaca opened a new issue, #25163: Common-SQL Operators Various Bugs

denimalpaca opened a new issue, #25163:
URL: https://github.com/apache/airflow/issues/25163

   ### Apache Airflow Provider(s)
   
   common-sql
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-common-sql==1.0.0`
   
   ### Apache Airflow version
   
   2.3.3 (latest released)
   
   ### Operating System
   
   macOS Monterey 12.3.1
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   - `SQLTableCheckOperator` builds multiple checks in such a way that if two or more checks are given, and one is not a fully aggregated statement, then the SQL fails as it is missing a `GROUP BY` clause.
   - `SQLColumnCheckOperator` provides only the last SQL query built from the columns, so when a check fails, it will only give the correct SQL in the exception statement by coincidence.
   
   ### What you think should happen instead
   
   - Multiple checks should not need a `GROUP BY` clause
   - Either the correct SQL statement, or no SQL statement, should be returned in the exception message.
   
   ### How to reproduce
   
   For the `SQLTableCheckOperator`, using the operator like so:
   ```
   table_cheforestfire_costs_table_checkscks = SQLTableCheckOperator(
               task_id="forestfire_costs_table_checks",
               table=SNOWFLAKE_FORESTFIRE_COST_TABLE,
               checks={
                   "row_count_check": {"check_statement": "COUNT(*) = 9"},
                   "total_cost_check": {"check_statement": "land_damage_cost + property_damage_cost + lost_profits_cost = total_cost"}
               }
           )
   ```
   
   For the `SQLColumnCheckOperator`, using the operator like so:
   ```
   cost_column_checks = SQLColumnCheckOperator(
       task_id="cost_column_checks",
       table=SNOWFLAKE_COST_TABLE,
       column_mapping={
           "ID": {"null_check": {"equal_to": 0}},
           "LAND_DAMAGE_COST": {"min": {"geq_to": 0}},
           "PROPERTY_DAMAGE_COST": {"min": {"geq_to": 0}},
           "LOST_PROFITS_COST": {"min": {"geq_to": 0}},
       }
   )
   ```
   and ensuring that any of the `ID`, `LAND_DAMAGE_COST`, or `PROPERTY_DAMAGE_COST` checks fail.
   
   An example DAG with the correct environment and data can be found [here](https://github.com/astronomer/airflow-data-quality-demo/blob/main/dags/snowflake_examples/complex_snowflake_transform.py).
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25163: Common-SQL Operators Various Bugs

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25163: Common-SQL Operators Various Bugs
URL: https://github.com/apache/airflow/issues/25163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org