You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/18 19:35:52 UTC
[GitHub] [airflow] omarismail94 opened a new issue #13750: Standard SQL sensor
omarismail94 opened a new issue #13750:
URL: https://github.com/apache/airflow/issues/13750
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the following questions.
Don't worry if they're not all applicable; just try to include what you can :-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
**Description**
A sql sensor which uses Standard SQL due to default one uses legacy sql
**Use case / motivation**
Currently (correct me if I am wrong!), the sql sensor only supports legacy sql. If I want to poke a BQ table, I do not think I can do that using standard sql right now.
**Are you willing to submit a PR?**
If community approves of this idea, sure!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] mik-laj commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762818262
Can you provide an example of the actual usage? What sesnors are you talking about?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-892585708
I too quite like the hook_kwargs option, as it would solve other db hook default issues too, not just BigQuery one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sb7f9 edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
sb7f9 edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112
@eladkal standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql which does not support many of the standard sql such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] omarismail94 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
omarismail94 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-767026584
@mik-laj Yeah, you have to do something like this:
```
class BigQuerySqlSensor(BaseSensorOperator):
template_fields = ('sql',)
template_ext= ('.sql',)
@apply_defaults
def __init__(
self,
bigquery_conn_id = 'bigquery_conn_id',
delegate_to=None,
location='US',
sql=None,
use_legacy_sql=False,
**kwargs):
self.bigquery_conn_id = bigquery_conn_id
self.delegate_to = delegate_to
self.location = location
self.sql = sql
self.use_legacy_sql = use_legacy_sql
super().__init__(**kwargs)
def poke(self, context):
hook = BigQueryHook(bigquery_conn_id=self.bigquery_conn_id, delegate_to=self.delegate_to, location=self.location, use_legacy_sql= self.use_legacy_sql)
connection = hook.get_conn()
cursor = connection.cursor()
cursor.execute(self.sql)
for row in cursor.fetchall():
self.log.info("printing rows ...")
self.log.info(row)
row_count = row[0]
self.log.info("rows printed.")
if row_count > 0:
return True
return False
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sb7f9 edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
sb7f9 edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112
@eladkal standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql which does not support many of the standard sql such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dinigo edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
dinigo edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-893381710
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] omarismail94 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
omarismail94 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-767026584
@mik-laj Yeah, you have to do something like this:
```
class BigQuerySqlSensor(BaseSensorOperator):
template_fields = ('sql',)
template_ext= ('.sql',)
@apply_defaults
def __init__(
self,
bigquery_conn_id = 'bigquery_conn_id',
delegate_to=None,
location='US',
sql=None,
use_legacy_sql=False,
**kwargs):
self.bigquery_conn_id = bigquery_conn_id
self.delegate_to = delegate_to
self.location = location
self.sql = sql
self.use_legacy_sql = use_legacy_sql
super().__init__(**kwargs)
def poke(self, context):
hook = BigQueryHook(bigquery_conn_id=self.bigquery_conn_id, delegate_to=self.delegate_to, location=self.location, use_legacy_sql= self.use_legacy_sql)
connection = hook.get_conn()
cursor = connection.cursor()
cursor.execute(self.sql)
for row in cursor.fetchall():
self.log.info("printing rows ...")
self.log.info(row)
row_count = row[0]
self.log.info("rows printed.")
if row_count > 0:
return True
return False
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-892585708
I too quite like the hook_kwargs option, as it would solve other db hook default issues too, not just BigQuery one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-797330539
I struggled with this a while ago, I ended up creating a custom sensor specifically using the BigQuery hook that allows me to specify the parameters, very similar to the code that @omarismail94 shared.
The "easy" option would be to change the default value of the [BigQueryHook](https://github.com/apache/airflow/blob/d2c2a2285c176ef232452e72a28e355667b8b50b/airflow/providers/google/cloud/hooks/bigquery.py#L67) for `use_legacy_sql` to `False` but I would guess a change like that needs some consideration.
I think there is justification for this: legacy SQL hasn't been the default in Google BigQuery since 2016. I'd expect 5 years on, the vast majority of BigQuery projects now use standard SQL.
If a change like that isn't viable, perhaps we could look into extending the behaviour of the [Connection.get_hook()](https://github.com/apache/airflow/blob/90c15b1982dc57d258cf7cb098776e0601cb9a5c/airflow/models/connection.py#L267) method, which is called in the SqlSensor to get the hook? Right now it just returns a hook with the default params, perhaps this could be more flexible and allow the instantiation of the SqlSensor to specify custom parameter which is then transferred onto the returned hook.
e.g.
```python
my_sensor = SqlSensor(
...
hook_parameters = Dict(use_legacy_sql = False),
...
)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-896556433
Anyone fancy a pull request? Sounds not too complicated to me!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-889624601
We should probably support passing arguments from a sensor to the underlying hook. Something like this?
```python
sense_data = SqlSensor(
task_id="sense_data",
conn_id="google_default",
hook_kwargs={"use_legacy_sql": False},
sql="select count(*) > 0 from `my_dataset.my_view`",
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-893381710
Ok, that's another option. It was suggested here with an example piece of code https://github.com/apache/airflow/issues/17315
What do you think then?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-889471613
In cae someone comes around looking for a solution for this. I solved it with this quick fix
```python
class BigQuerySqlSensor(SqlSensor):
""" Overwrites the use_legacy_sql when using SqlSensor with a BigQuery connection"""
def _get_hook(self):
hook = super()._get_hook()
hook.use_legacy_sql = False
hook.location = Variable.get('location')
return hook
sense_data = BigQuerySqlSensor(
task_id="sense_data",
conn_id="google_default",
sql="select count(*) > 0 from `my_dataset.my_view`",
)
```
Apart from this quick fix I see the point on defaulting to the "legacy" dialect to keep backwards compatibility. I see two solutions for this issue:
1. Expose and read the dialect version from the connection extras
2. Set the flag dynamically from the sql code if it starts with `#starndardSQL` like the [docs mention](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix)
Both of them can be implemented. Any thoughts on what could be preferable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762856420
Isn't [SqlSeneor](https://github.com/apache/airflow/blob/c065d32189bfee80ab938d96ad74f6492e9c9b24/airflow/sensors/sql.py#L77) compatible with BigQuery?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sb7f9 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
sb7f9 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112
@eladkal standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql queries which does not support many of the regular sql queries such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #13750:
URL: https://github.com/apache/airflow/issues/13750
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sb7f9 commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
sb7f9 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112
@eladkal standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql queries which does not support many of the regular sql queries such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #13750: Standard SQL sensor
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762436947
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor
Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-897528058
WIP, have something, just struggling with the tests. I'll open it so we can review "collectively"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org