You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/21 08:28:21 UTC

[GitHub] [airflow] lwyszomi opened a new pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

lwyszomi opened a new pull request #22395:
URL: https://github.com/apache/airflow/pull/22395


   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#issuecomment-1073790590






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830908835



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       To clarify my comment is on the Operator. We don't have this problem in the hook.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830860848



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       This may be very confusing with `retries` from `BaseOprerator`.
   
   ```
   BigQueryInsertJobOperator(
   ...,
   retry=3,
   retries=10
   )
   ```
   
   We need better name for it - I think. Something that the user will know that `retry` refer to BigQuery "internal" capability.
   
   I would say that the same goes for the `timeout`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lwyszomi commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
lwyszomi commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r831076209



##########
File path: airflow/providers/google/cloud/hooks/bigquery.py
##########
@@ -1503,6 +1503,8 @@ def insert_job(
         project_id: Optional[str] = None,
         location: Optional[str] = None,
         nowait: bool = False,
+        retry: Optional[Retry] = None,

Review comment:
       Default value was `DEFAULT_RETRY` to keep the same behavior that we had previously I changed `None` to `DEFAULT_RETRY`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830921840



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       I appreciate that but note also that other functions in the hook has this parameter already defined as retry and also with default of `DEFAULT_RETRY`
   
   https://github.com/apache/airflow/blob/71c980a8ffb3563bf16d8a23a58de54c9e8cf556/airflow/providers/google/cloud/hooks/bigquery.py#L320
   
   https://github.com/apache/airflow/blob/71c980a8ffb3563bf16d8a23a58de54c9e8cf556/airflow/providers/google/cloud/hooks/bigquery.py#L478
   
   I would suggest to check how/if other operators utilize the retry in the other functions it should help to make a decision what is the best approach here. I don't feel strongly about any specific approach just want to make sure that we are consistent (or if we want to change conventions that is also ok - just require more work)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lwyszomi commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
lwyszomi commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830917378



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       @eladkal I know but I wanted to keep the same names of parameters in both places to avoid problems with understanding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r831005786



##########
File path: airflow/providers/google/cloud/hooks/bigquery.py
##########
@@ -1503,6 +1503,8 @@ def insert_job(
         project_id: Optional[str] = None,
         location: Optional[str] = None,
         nowait: bool = False,
+        retry: Optional[Retry] = None,

Review comment:
       Just to clarify what is the reason we use `None` rather than `DEFAULT_RETRY`?
   What is the default behavior or job.result() ? I assume there was some value for retry set in the underlying service regardless if we set it explicitly or not? (Just want to make sure that this is not a breaking change)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lwyszomi commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
lwyszomi commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830898338



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       @eladkal updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830860848



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       This may be very confusing with `retries` from `BaseOprerator`.
   
   ```
   BigQueryInsertJobOperator(
   ...,
   retry=3,
   retries=10
   )
   ```
   
   We need better name for it - I think. Something that the user will know that retries refer to BigQuery "internal" capability.
   
   I would say that the same goes for the `timeout`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lwyszomi commented on a change in pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
lwyszomi commented on a change in pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#discussion_r830871310



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2052,6 +2053,8 @@ class BigQueryInsertJobOperator(BaseOperator):
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
     :param cancel_on_kill: Flag which indicates whether cancel the hook's job or not, when on_kill is called
+    :param retry: Designation of what errors, if any, should be retried.

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #22395:
URL: https://github.com/apache/airflow/pull/22395#issuecomment-1073954897


   thanks @lwyszomi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal merged pull request #22395: Add timeout and retry to the BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
eladkal merged pull request #22395:
URL: https://github.com/apache/airflow/pull/22395


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org