You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/25 16:45:41 UTC

[GitHub] [airflow] nttdriva opened a new issue #15010: Allow PostgreSQL's operator to return the query result

nttdriva opened a new issue #15010:
URL: https://github.com/apache/airflow/issues/15010


   **Description**
   
   Right now this operator is not able to return any query result
   
   **Use case / motivation**
   
   It should be allowed to do so in order to use them to retrieve useful data and use it in the following nodes
   
   **Are you willing to submit a PR?**
   
   Yup ofc!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-865307499


   I think it's a deliberate decision to not store the results in Xcom. This is quite a powerful message to the users  in fact "DO NOT" use XCom to pass data there. Sometimes this kind of decision is made - to make some actions more difficult rather than easy to discourage the easy (but not correct) path.
   
   It's rather simple (but a little involved) to write a custom operator to do something with the result of a query if you want. Then you can read data from DB and do something with it in the same operation. This is what Hooks are for. You can use PostgresHook in your custom operator to read the data, and then use another Hook to write it somewhere else or write your XCom based on results. But you are not encouraged to simply use the operator to run a SELECT * query and pass the result via XCom.
   
   Passing results of a select query to XCom, which actually performs another write to the DB makes little sense in Airflow world. Making it too easy makes people fall into bad use patterns. So (it was not me - the decision was  made long before I joined Airflow) the decision was made to make it difficult to follow that path.
   
   The operator is really intended to run DML or DDL rather than DQL queries. And I think it's better if it stays this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-865307499


   I think it's a deliberate decision to not store the results in Xcom. This is quite a powerful message to the users in fact: "DO NOT" use XCom to pass data there. Sometimes this kind of decision is made - to make some actions more difficult rather than easy to discourage the easy (but not correct) path.
   
   It's rather simple (but a little involved) to write a custom operator to do something with the result of a query if you want. Then you can read data from DB and do something with it in the same operation. This is what Hooks are for. You can use PostgresHook in your custom operator to read the data, and then use another Hook to write it somewhere else or write your XCom based on results. But you are not encouraged to simply use the operator to run a SELECT * query and pass the result via XCom.
   
   Passing results of a select query to XCom, which actually performs another write to the DB makes little sense in Airflow world. Making it too easy makes people fall into bad use patterns. So (it was not me - the decision was  made long before I joined Airflow) the decision was made to make it difficult to follow that path.
   
   The operator is really intended to run DML or DDL rather than DQL queries. And I think it's better if it stays this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-865307499


   I think it's a deliberate decision to not store the results in Xcom. This is quite a powerful message to the users in fact: "DO NOT" use XCom to pass data there. Sometimes this kind of decision is made - to make some actions more difficult rather than easy to discourage the easy (but not correct) path.
   
   It's rather simple (but a little involved) to write a custom operator to do something with the result of a query if you want. Then you can read data from DB and do something with it in the same operation. This is what Hooks are for. You can use PostgresHook in your custom operator to read the data, and then use another Hook to write it somewhere else or write your XCom based on results. But you are not encouraged to simply use the operator to run a SELECT * query and pass the result via XCom.
   
   Passing results of a select query to XCom, which actually performs another write to the DB makes little sense in Airflow world. Making it too easy makes people fall into bad use patterns. So (it was not me - the decision was  made long before I joined Airflow) the decision was made to make it difficult to follow that path.
   
   The operator is really intended to run DML or DDL rather than DQL queries. And I think it's better if it stays this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fredthomsen edited a comment on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
fredthomsen edited a comment on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-863641384


   I am interested in this feature.  Two things of note:
   - The example featured [here](https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/_modules/airflow/providers/postgres/example_dags/example_postgres.html), doesn't make a lot of sense without this feature.  If an Operator that pulls data, can't pass it's values via xcom or write to a file what's the point?  Is the intent to inherit from the `PostgresOperator` when using it for reads to store the data where you need it?  I suppose that's a question for the original author of the operator.
   - I've seen comments to the effect of "Airflow is not a data processing platform so you shouldn't read from DBs", missing the point that just because I am reading from a database doesn't mean I am trying to grab a ton of data, just some data in that has a known schema perhaps stored by a previous DAG.
   
   If this has gotten stale, I am happy to handle this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nttdriva commented on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
nttdriva commented on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-808023831


   Hi @AmarEL thanks for the feedback.
   I agree with you, having more information on the documentation side would be really helpful


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #15010:
URL: https://github.com/apache/airflow/issues/15010


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #15010:
URL: https://github.com/apache/airflow/issues/15010


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fredthomsen commented on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
fredthomsen commented on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-863641384


   I am interested in this feature.  Two things of note:
   - The example featured [here](https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/_modules/airflow/providers/postgres/example_dags/example_postgres.html), doesn't make a lot of sense without this feature.  If an Operator can't pass it's values via xcom or write to a file what's the point?  Is the intent to inherit from the `PostgresOperator` when using it for reads to store the data where you need it?
   - I've seen comments to the effect of "Airflow is not a data processing platform so you shouldn't read from DBs", missing the point that just because I am reading from a database doesn't mean I am trying to grab a ton of data, just some data in that has a known schema perhaps stored by a previous DAG.
   
   If this has gotten stale, I am happy to handle this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-865307499


   I think it's a deliberate decision to not store the results in Xcom. This is quite a powerful message to the users  in fact "DO NOT" use XCom to pass data there. Sometimes this kind of decision is made - to make some actions more difficult rather than easy to discourage the easy (but not correct) path.
   
   It's rather simple (but a little involved) to write a custom operator to do something with the result of a query if you want. Then you can read data from DB and do something with it in the same operation. This is what Hooks are for. You can use PostgresHook in your custom operator to read the data, and then use another Hook to write it somewhere else or write your XCom based on results. But you are not encouraged to simply use the operator to run a SELECT * query and pass the result via XCom.
   
   Passing results of a select query to XCom, which actually performs another write to the DB makes little sense in Airflow world. Making it too easy makes people fall into bad use patterns. So (it was not me - the decision was  made long before I joined Airflow) the decision was made to make it difficult to follow that path.
   
   The operator is really intended to run DML or DDL rather than DQL queries. And I think it's better if it stays this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL commented on issue #15010: Allow PostgreSQL's operator to return the query result

Posted by GitBox <gi...@apache.org>.
AmarEL commented on issue #15010:
URL: https://github.com/apache/airflow/issues/15010#issuecomment-807872099


   Nice topic.
   I faced a situation a few months ago that I needed the result query and discovered that it was not possible.
   
   Btw, if the community accepts it, some doc update describing this behavior would be nice.
   I recommended a few months ago make it clear in the documentation, but I think that it still vague:
   https://github.com/apache/airflow/pull/13281#discussion_r548752923
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org