You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/26 16:49:54 UTC

[GitHub] [airflow] alexbonella opened a new pull request #16676: Sql query results with snowflake operator

alexbonella opened a new pull request #16676:
URL: https://github.com/apache/airflow/pull/16676


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659207277



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       AFAIK - fetchall has no effect on DML/DDL queries :). But I agree for DQL queries it can have disastrous consequences by fetching all data to memory. But IMHO this is not a a problem - it's expected that when you debug, you can run out of memory or things will start running slower etc. I do not think fetchall changes the actual semantics of what happens with the results, so memory/time needed to fetch the records should be the only side-effect (but also that's why it should only run when debug level is effective for the logger:
   
   ```
   if self.log.isEnabledFor(logging.DEBUG):
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659192505



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       0This will actually impact ALL DB operators - not only Snowflake. I think info level here is quite a bit too much - possibly Debug will be better  - also running fetchall is potentially impacting the performance and seems we only need it to print logging information.
   
   I'd change it to debug level and put `fetchall` under:
   
   ```
   if self.log.isEnabledFor(logging.DEBUG):
   ```
   
   to minimise side-effects
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alexbonella commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
alexbonella commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659199840



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Hi @eladkal 
   
   You're right respect that ETLs most queries that you run are DML (insert, update, delete, etc..) but when we need to transform the query results in order to send an S3.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alexbonella commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
alexbonella commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659204467



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Thanks for your feedback 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alexbonella commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
alexbonella commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659193226



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Hi @potiuk  !! 
   
   something like that? 
   
   if parameters:
               cur.execute(sql_statement, parameters)
   
               if self.log.isEnabledFor(logging.DEBUG):
        
                   res = cur.fetchall()
                   result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
                   self.log.info("Query Results: %s", result)
   
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659203553



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Just a comment to what @eladkal wrote. It's not entirely correct. I think the DML mostly is pretty valid for the operators. But we do not actually prevent people to run the DQL queries, via Hooks. It's no problem whatsoever to build your custom operator to use two hooks and pass the DQL'ed data with some manipulation to another hook. This is what most transfer operators do. 
   
   That's why I think having debug level to show the results is pretty useful when you iterate and test your custom operators.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alexbonella commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
alexbonella commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659193226



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Hi @potiuk  !! 
   
   something like that? 
   
   ```
   if parameters:
               cur.execute(sql_statement, parameters)
   
               if self.log.isEnabledFor(logging.DEBUG):
        
                   res = cur.fetchall()
                   result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
                   self.log.info("Query Results: %s", result)
   ```
   
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#issuecomment-927201063


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #16676:
URL: https://github.com/apache/airflow/pull/16676


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659195337



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       What kind of query do you run that this change is needed?
   In ETLs most queries that you run are DMLs (insert, update, delete, etc..) these statments return 1 row.
   I think you are running DQL (select) which is a bit odd?
   My question here is if this change is needed and if so why not using fetchone? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659205722



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       I'm ok with the debug. my issue is mostly with the `fetchall`
   It's important to remember that debug isn't for a specific dag - it's cluster wide setting so when you set debug other DAGs log may get really big but I guess that depends how each organization works.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#issuecomment-869028795


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659203553



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       Just a comment to what @eladkal wrote. It's not entirely correc. I think the DML mostly is pretty valid for the operators. But we do not actually prevent people to run the DQL queries, via Hooks. It's no problem whatsoever to build your custom operator to use two hooks and pass the DQL'ed data with some manipulation to another hook. This is what most transfer operators do. 
   
   That's why I think having debug level to show the results is pretty useful when you iterate and test your custom operators.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r659207277



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       AFAIK - fetchall has no effect on DQK/DDL queries :). But I agree for DQL queries it can have disastrous consequences by fetching all data to memory. But IMHO this is not a a problem - it's expected that when you debug, you can run out of memory or things will start running slower etc. I do not think fetchall changes the actual semantics of what happens with the results, so memory/time needed to fetch the records should be the only side-effect (but also that's why it should only run when debug level is effective for the logger:
   
   ```
   if self.log.isEnabledFor(logging.DEBUG):
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#issuecomment-869109950


   There are also some linter errors (code format issues, I think—didn’t check the linter output) to fix. It would be best if you could run the checks locally with pre-commit before pushing. Instructions are available in the CONTRIBUTING.rst file in the project root.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on a change in pull request #16676: Sql query results with snowflake operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on a change in pull request #16676:
URL: https://github.com/apache/airflow/pull/16676#discussion_r686680217



##########
File path: airflow/hooks/dbapi.py
##########
@@ -211,8 +211,13 @@ def _run_command(self, cur, sql_statement, parameters):
         self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
         if parameters:
             cur.execute(sql_statement, parameters)
+            res = cur.fetchall()
+            result={'sql_statement': res,'headers': list(map(lambda t: t[0], cur.description))}
+            self.log.info("Query Results: %s", result)

Review comment:
       > Hi @eladkal
   > 
   > You're right respect that ETLs most queries that you run are DML (insert, update, delete, etc..) but when we need to transform the query results in order to send an S3.
   
   I don't understand your use case. Could you explain a little bit, please? Just curious




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org