You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "JDarDagran (via GitHub)" <gi...@apache.org> on 2023/11/09 13:41:01 UTC

[I] openlineage: improve how sql utils parse table schemas [airflow]

JDarDagran opened a new issue, #35552:
URL: https://github.com/apache/airflow/issues/35552

   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   For SQL based operators there is `airflow.providers.openlineage.utils.sql` module used by `SQLParser` interface class.
   In short: it allows to parse table schemas based on input and output dataset parsed from SQL query.
   
   ### What you think should happen instead
   
   It should take into consideration if there is database/schema from connection setup detected from information schema query result. If there is one found it should stop adding other tables.
   
   ### How to reproduce
   
   Corner case is following:
   1. use database connection with database and/or schema default set
   2. refer to table name only in SQL query (e.g. `SELECT * FROM my_table` instead of `SELECT * FROM my_schema.my_table`)
   3. if there's the same table name in other database/schema (or database+schema combination, it depends on database) OL integration will produce two datasets for tables. 
   For instance if one uses postgres with search path set to `public` schema `SELECT * FROM my_table` would get data from `public.my_table` even if there is another table with the same name but different schema. OL integration will take both `my_schema.my_table` and `public.my_table`.
   
   ### Operating System
   
   macOS
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-openlineage==1.2.0
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] openlineage: improve how sql utils parse table schemas [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #35552:
URL: https://github.com/apache/airflow/issues/35552#issuecomment-1803946585

   Good ideas!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] openlineage: improve how sql utils parse table schemas [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #35552:
URL: https://github.com/apache/airflow/issues/35552#issuecomment-1803853595

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org