You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/18 19:35:52 UTC

[GitHub] [airflow] omarismail94 opened a new issue #13750: Standard SQL sensor

omarismail94 opened a new issue #13750:
URL: https://github.com/apache/airflow/issues/13750


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   A sql sensor which uses Standard SQL due to default one uses legacy sql
   
   
   **Use case / motivation**
   
   Currently (correct me if I am wrong!), the sql sensor only supports legacy sql. If I want to poke a BQ table, I do not think I can do that using standard sql right now.
   
   
   **Are you willing to submit a PR?**
   
   If community approves of this idea, sure!
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762818262


   Can you provide an example of the actual usage? What sesnors are you talking about?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-892585708


   I too quite like the hook_kwargs option, as it would solve other db hook default issues too, not just BigQuery one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sb7f9 edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
sb7f9 edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112


   @eladkal  standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql which does not support many of the standard sql such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] omarismail94 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
omarismail94 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-767026584


   @mik-laj  Yeah, you have to do something like this:
   
   ```
   class BigQuerySqlSensor(BaseSensorOperator):
       template_fields = ('sql',)
       template_ext= ('.sql',)
   
   
       @apply_defaults
       def __init__(
           self, 
           bigquery_conn_id = 'bigquery_conn_id', 
           delegate_to=None,
           location='US',
           sql=None, 
           use_legacy_sql=False, 
           **kwargs):
   
           self.bigquery_conn_id = bigquery_conn_id
           self.delegate_to = delegate_to
           self.location = location
           self.sql = sql
           self.use_legacy_sql = use_legacy_sql
           super().__init__(**kwargs)
   
       def poke(self, context):
           hook = BigQueryHook(bigquery_conn_id=self.bigquery_conn_id, delegate_to=self.delegate_to, location=self.location, use_legacy_sql= self.use_legacy_sql)
           connection = hook.get_conn()
           cursor = connection.cursor()
           cursor.execute(self.sql)
   
           for row in cursor.fetchall():
               self.log.info("printing rows ...")
               self.log.info(row)
               row_count = row[0]
               self.log.info("rows printed.")
   
               if row_count > 0:
                   return True
   
           return False
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sb7f9 edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
sb7f9 edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112


   @eladkal  standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql which does not support many of the standard sql such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dinigo edited a comment on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
dinigo edited a comment on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-893381710






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] omarismail94 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
omarismail94 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-767026584


   @mik-laj  Yeah, you have to do something like this:
   
   ```
   class BigQuerySqlSensor(BaseSensorOperator):
       template_fields = ('sql',)
       template_ext= ('.sql',)
   
   
       @apply_defaults
       def __init__(
           self, 
           bigquery_conn_id = 'bigquery_conn_id', 
           delegate_to=None,
           location='US',
           sql=None, 
           use_legacy_sql=False, 
           **kwargs):
   
           self.bigquery_conn_id = bigquery_conn_id
           self.delegate_to = delegate_to
           self.location = location
           self.sql = sql
           self.use_legacy_sql = use_legacy_sql
           super().__init__(**kwargs)
   
       def poke(self, context):
           hook = BigQueryHook(bigquery_conn_id=self.bigquery_conn_id, delegate_to=self.delegate_to, location=self.location, use_legacy_sql= self.use_legacy_sql)
           connection = hook.get_conn()
           cursor = connection.cursor()
           cursor.execute(self.sql)
   
           for row in cursor.fetchall():
               self.log.info("printing rows ...")
               self.log.info(row)
               row_count = row[0]
               self.log.info("rows printed.")
   
               if row_count > 0:
                   return True
   
           return False
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-892585708


   I too quite like the hook_kwargs option, as it would solve other db hook default issues too, not just BigQuery one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lewis-anderson53 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
lewis-anderson53 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-797330539


   I struggled with this a while ago, I ended up creating a custom sensor specifically using the BigQuery hook that allows me to specify the parameters, very similar to the code that @omarismail94 shared.
   
   The "easy" option would be to change the default value of the [BigQueryHook](https://github.com/apache/airflow/blob/d2c2a2285c176ef232452e72a28e355667b8b50b/airflow/providers/google/cloud/hooks/bigquery.py#L67) for `use_legacy_sql` to `False` but I would guess a change like that needs some consideration. 
   I think there is justification for this: legacy SQL hasn't been the default in Google BigQuery since 2016. I'd expect 5 years on, the vast majority of BigQuery projects now use standard SQL.
   
   If a change like that isn't viable, perhaps we could look into extending the behaviour of the [Connection.get_hook()](https://github.com/apache/airflow/blob/90c15b1982dc57d258cf7cb098776e0601cb9a5c/airflow/models/connection.py#L267) method, which is called in the SqlSensor to get the hook? Right now it just returns a hook with the default params, perhaps this could be more flexible and allow the instantiation of the SqlSensor to specify custom parameter which is then transferred onto the returned hook.
   
   e.g. 
   
   ```python
   my_sensor = SqlSensor(
       ...
       hook_parameters = Dict(use_legacy_sql = False),
       ...
   )
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-896556433


   Anyone fancy a pull request? Sounds not too complicated to me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-889624601


   We should probably support passing arguments from a sensor to the underlying hook. Something like this?
   
   ```python
   sense_data = SqlSensor(
       task_id="sense_data",
       conn_id="google_default",
       hook_kwargs={"use_legacy_sql": False},
       sql="select count(*) > 0 from `my_dataset.my_view`",
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-893381710


   Ok, that's another option. It was suggested here with an example piece of code https://github.com/apache/airflow/issues/17315
   
   What do you think then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-889471613


   In cae someone comes around looking for a solution for this. I solved it with this quick fix
   
   ```python
   class BigQuerySqlSensor(SqlSensor):
       """ Overwrites the use_legacy_sql when using SqlSensor with a BigQuery connection"""
       def _get_hook(self):
           hook = super()._get_hook()
           hook.use_legacy_sql = False
           hook.location = Variable.get('location')
           return hook
   
   sense_data = BigQuerySqlSensor(
       task_id="sense_data",
       conn_id="google_default",
       sql="select count(*) > 0 from `my_dataset.my_view`",
   )
   ```
   
   Apart from this quick fix I see the point on defaulting to the "legacy" dialect to keep backwards compatibility. I see two solutions for this issue:
   1. Expose and read the dialect version from the connection extras
   2. Set the flag dynamically from the sql code if it starts with `#starndardSQL` like the [docs mention](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix)
   
   Both of them can be implemented. Any thoughts on what could be preferable?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762856420


   Isn't [SqlSeneor](https://github.com/apache/airflow/blob/c065d32189bfee80ab938d96ad74f6492e9c9b24/airflow/sensors/sql.py#L77) compatible with BigQuery?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sb7f9 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
sb7f9 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112


   @eladkal  standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql queries which does not support many of the regular sql queries such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #13750:
URL: https://github.com/apache/airflow/issues/13750


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sb7f9 commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
sb7f9 commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-766885112


   @eladkal  standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql queries which does not support many of the regular sql queries such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13750: Standard SQL sensor

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-762436947


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dinigo commented on issue #13750: Support Standard SQL in BigQuery Sensor

Posted by GitBox <gi...@apache.org>.
dinigo commented on issue #13750:
URL: https://github.com/apache/airflow/issues/13750#issuecomment-897528058


   WIP, have something, just struggling with the tests. I'll open it so we can review "collectively"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org