You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/09 06:36:48 UTC

[GitHub] [airflow] david30907d opened a new issue #19507: Make BigQueryHook compatible with sqlalchemy

david30907d opened a new issue #19507:
URL: https://github.com/apache/airflow/issues/19507


   ### Description
   
   ```python
   hook = BigQueryHook('google_cloud_default', use_legacy_sql=False)
   hook.get_sqlalchemy_engine()
   ```
   
   this one would raise this exception 👇 
   ```bash
   [2021-11-10 08:26:06,927] {taskinstance.py:1501} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
       result = task_copy.execute(context=context)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 150, in execute
       return_value = self.execute_callable()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 161, in execute_callable
       return self.python_callable(*self.op_args, **self.op_kwargs)
     File "/opt/airflow/dags/dags/utils/others/subscription_related.py", line 98, in wrapper
       return func(*args, **kwargs)
     File "/opt/airflow/dags/dags/ods/ingest_emails/udfs/ingest_emails.py", line 59, in extract_and_load_campaign_emails
       esp.extract_and_load_campaign_emails(context)
     File "/opt/airflow/dags/dags/utils/factories/esp_factory/klaviyo_esp.py", line 230, in extract_and_load_campaign_emails
       olap_session = engine2session(self.olap_hook.get_sqlalchemy_engine())
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/hooks/dbapi.py", line 100, in get_sqlalchemy_engine
       return create_engine(self.get_uri(), **engine_kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/__init__.py", line 525, in create_engine
       return strategy.create(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 61, in create
       entrypoint = u._get_entrypoint()
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/url.py", line 172, in _get_entrypoint
       cls = registry.load(name)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 277, in load
       raise exc.NoSuchModuleError(
   sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:google_cloud_platform
   ```
   
   ### Use case/motivation
   
   I hope this sqlalchemy toy example works
   
   ```python
   from sqlalchemy.orm import sessionmaker, scoped_session
   engine = hook.get_sqlalchemy_engine()
   scoped_session(sessionmaker(bind=engine, autocommit=True))
   ```
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989735471


   Seems you're right @david30907d 
   So now I see three options: 
   - Rename `conn_type` from `gcpbigquery` to `bigquery` (harder)
   - Overwrite of `DbApiHook.get_uri` in `BigQueryHook` (easier).
   - Try not to use `BigQueryHook.get_uri()` and use a lot of other methods provided by hook (up to you)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d closed issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d closed issue #19507:
URL: https://github.com/apache/airflow/issues/19507


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-997206913


   @david30907d 
   I worry that Google integration is very sophisticated for me, especially BigQuery.
   There are dozen of deprecated methods there. I thought to remove a few deprecated methods but replaced them with other deprecated ones. https://github.com/apache/airflow/pull/20119
   
   I still don't have a point of view on how to improve this provider. But, seems like renaming `conn_type` from `gcpbigquery` to `bigquery` is a better solution. Almost all connection types are also protocols like `presto://`, `spark://`, `postgres://`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989735471


   Seems you're right @david30907d 
   So now I see three options: 
   - Rename `conn_type` from `gcpbigquery` to `bigquery` (harder)
   - Overwrite of `DbApiHook.get_uri` in `BigQueryHook` (easier).
   - `BigQueryHook.get_uri().replace('gcpbigquery', 'bigquery')` (will work for you, even if changes above will be implemented)
   - Try not to use `BigQueryHook.get_uri()` and use a lot of other methods provided by hook (up to you)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989735471


   Seems you're right @david30907d 
   So now I see three options: 
   - Rename `conn_type` from `gcpbigquery` to `bigquery` (harder)
   - Overwrite of `DbApiHook.get_uri` in `BigQueryHook` (easier).
   - `BigQueryHook.get_uri().replace('gcpbigquery', 'bigquery ')` (will work for you, even if changes above will be implemented)
   - Try not to use `BigQueryHook.get_uri()` and use a lot of other methods provided by hook (up to you)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989564695


   not sure I guess we still need to overwrite the `get_uri()` function like these 2 examples ?
   https://github.com/apache/airflow/blob/f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3/airflow/providers/amazon/aws/hooks/redshift.py#L193
   https://github.com/apache/airflow/blob/f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3/airflow/providers/apache/drill/hooks/drill.py#L62


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989563954


   @kazanzhy I could be wrong but it still fails on my local (2.2.2)
   seems to me that [conn_type](https://github.com/apache/airflow/blob/main/airflow/hooks/dbapi.py#L107) here would return `gcpbigquery` instead of `bigquery`, so sqlalchemy cannot parse this URL
   
   ```bash
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/hooks/dbapi.py", line 118, in get_sqlalchemy_engine
       return create_engine(self.get_uri(), **engine_kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/__init__.py", line 525, in create_engine
       return strategy.create(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 54, in create
       u = url.make_url(name_or_url)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/url.py", line 229, in make_url
       return _parse_rfc1738_args(name_or_url)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/url.py", line 290, in _parse_rfc1738_args
       raise exc.ArgumentError(
   sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'gcpbigquery:'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989497534


   @kazanzhy sorry for the late reply
   good to know this new conn_type! I'm still using 2.1.2 so didn't realize it 😅 
   going to close this issue, thx~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d edited a comment on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d edited a comment on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-997139792


   @kazanzhy sorry for the late reply~
   very thx for these inputs~ I'll give it a shot and hope I can finish this PR before 2022 😂 
   
   btw, you're available, maybe you can review my [PR](https://github.com/apache/airflow/pull/19508)? 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-997139792


   @kazanzhy sorry for the late reply~
   very thx for these inputs~ I'll give it a shot and hope I can finish this PR before 2022 😂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-989735471


   Seems you're right @david30907d 
   So now I see next options: 
   - Rename `conn_type` from `gcpbigquery` to `bigquery` (harder)
   - Overwrite of `DbApiHook.get_uri` in `BigQueryHook` (easier).
   - `BigQueryHook.get_uri().replace('gcpbigquery', 'bigquery')` (will work for you, even if changes above will be implemented)
   - Try not to use `BigQueryHook.get_uri()` and use a lot of other methods provided by hook (up to you)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-964904161


   PR: https://github.com/apache/airflow/pull/19508


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on issue #19507: Make BigQueryHook compatible with sqlalchemy

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on issue #19507:
URL: https://github.com/apache/airflow/issues/19507#issuecomment-982789235


   I think now in works like that:
   ```
   conn_type == 'gcpbigquery' -> BigQueryHook
   conn_type == 'google_cloud_platform' -> GoogleBaseHook
   ```
   If you want to get the sqlalchemy_engine from the hook it should be an instance of DbApiHook. 
   Now only BigQueryHook inherits from DbApiHook


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org