You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "masterlittle (via GitHub)" <gi...@apache.org> on 2023/02/02 10:47:17 UTC

[GitHub] [airflow] masterlittle opened a new issue, #29307: Bigquery Operators in defferable mode fail if the location is not US.

masterlittle opened a new issue, #29307:
URL: https://github.com/apache/airflow/issues/29307

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==8.8.0
   
   ### Apache Airflow version
   
   2.5.1
   
   ### Operating System
   
   Mac
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   While using BigqueryInsertJobOperator in defferable mode, it fails.
   Logs - 
   
   `airflow.exceptions.AirflowException: 404, message='Not Found: {\n  "error": {\n    "code": 404,\n    "message": "Not found: Job dev-data-platform-294611:airflow_derived_tables_all_tasks_group_temp_user_seg_all_create_stage_table_2023_02_01T04_30_00_00_00_0f2853ad8762909d41067023ddb3c6d8",\n    "errors": [\n      {\n        "message": "Not found: Job dev-data-platform-294611:airflow_derived_tables_all_tasks_group_temp_user_seg_all_create_stage_table_2023_02_01T04_30_00_00_00_0f2853ad8762909d41067023ddb3c6d8",\n        "domain": "global",\n        "reason": "notFound"\n      }\n    ],\n    "status": "NOT_FOUND"\n  }\n}\n', url=URL('https://www.googleapis.com/bigquery/v2/projects/dev-data-platform-294611/jobs/airflow_derived_tables_all_tasks_group_temp_user_seg_all_create_stage_table_2023_02_01T04_30_00_00_00_0f2853ad8762909d41067023ddb3c6d8')`
   
   ### What you think should happen instead
   
   The bigquery insert job should succeed.
   
   Debugged and the error happens because the google-aio lib does not pass in the location param when making the GET JOB api call to bigquery.
   
   Acc. to docs if using any region besides us and europe, this location should be passed.
   
   As seen in the logs, it uses the domain global instead of the location that is passed to it.
   
   ### How to reproduce
   
   In the BigqueryInsertJobOperator, give a location which is not in US (eg - asia-south1) and make deferrable=true
   
   ```
   BigQueryInsertJobOperator(
                               task_id="create_stage_table",
                               project_id=configuration.task_info.bigquery_info.billing_project_id,
                               location='asia-south1',
                               deferrable=True,
                               configuration={
                                   "query": {
                                       "query": f"{{{{ ti.xcom_pull(task_ids='all_tasks.group_"
                                       f"{configuration.task_info.table_name}.prepare_query') }}}}",
                                       "useLegacySql": False,
                                       "priority": "BATCH",
                                       "writeDisposition": "WRITE_TRUNCATE",
                                       "destinationTable": {
                                           "projectId": configuration.task_info.project_name,
                                           "datasetId": configuration.task_info.stage_dataset,
                                           "tableId": configuration.task_info.stage_table,
                                       },
                                       "clustering": {"fields": get_clustering_key(configuration)},
                                       "time_partitioning": {
                                           'field': configuration.task_info.table_info.partition_key,
                                           'type': 'DAY',
                                       },
                                   },
                                   "dryRun": "{{ var.value.get('derived_tables_dry_run', 0) }}",
                                   "labels": get_query_labels(configuration),
                               },
                           )
   ```
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VladaZakharova commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "VladaZakharova (via GitHub)" <gi...@apache.org>.
VladaZakharova commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1420954260

   Hi Team! 
   After some investigation, it looks like that it is not a problem of hook implementation, but the way it uses Job object from gcloud.aio.bigquery package. 
   For correct retrieving of the Job with specific job_id it is required to pass location, if you are using location other then US and you will get 404 error:
   ```
   curl -H "Authorization: Bearer $ACCESS_TOKEN"   "https://www.googleapis.com/bigquery/v2/projects/airflow-system-tests-303516/jobs/AAA_f9a0fb8126085cf06d9155cc320cbbfa"
   {
     "error": {
       "code": 404,
       "message": "Not found: Job airflow-system-tests-303516.AAA_f9a0fb8126085cf06d9155cc320cbbfa",
       "errors": [
         {
           "message": "Not found: Job airflow-system-tests-303516.AAA_f9a0fb8126085cf06d9155cc320cbbfa",
           "domain": "global",
           "reason": "notFound"
         }
       ],
       "status": "NOT_FOUND"
     }
   }
   ```
   But specifying the correct location of the job in the url solves this problem:
   
   ```
   curl -H "Authorization: Bearer $ACCESS_TOKEN"   "https://www.googleapis.com/bigquery/v2/projects/airflow-system-tests-303516/jobs/AAA_f9a0fb8126085cf06d9155cc320cbbfa?location=asia-south1"
   {
     "kind": "bigquery#job",
     "etag": "7Ut/s/Jhqvgpbb8miRNbYg==",
     "id": "airflow-system-tests-303516:asia-south1.AAA_f9a0fb8126085cf06d9155cc320cbbfa",
     "selfLink": "https://www.googleapis.com/bigquery/v2/projects/airflow-system-tests-303516/jobs/AAA_f9a0fb8126085cf06d9155cc320cbbfa?location=asia-south1",
     "user_email": "vlada-system-test@airflow-system-tests-303516.iam.gserviceaccount.com",
     "configuration": {
       "query": {...
   ```
   
   Current implementation of the methods for Job object from gcloud.aio.bigquery package doesn't support adding location as a parameter for the methods:
   https://github.com/talkiq/gcloud-aio/blob/c64b370bdae5aa0bd72373d1e4ef1c4a4b55a3c7/bigquery/gcloud/aio/bigquery/job.py#L60
   But on the other hand, sync implementation of this method for QueryJob object takes location as a parameter when constructing url for the call:
   https://github.com/googleapis/python-bigquery/blob/beab7c2b27c27d8e824cbc66b290be8158da7abf/google/cloud/bigquery/client.py#L188
   
   I have created an issue in GitHub repo for the gcloud.aio.bigquery package with the example and reproduction steps of the problem, however it may take some time to covert this issue from their side.
   As a workaround, I can implement async methods for BigQuery calls and don't use the Job object from gcloud.aio.bigquery package, but it will require to change all other classes that use current implementation of BigQueryAsyncHook with calling Job object methods: BigQueryIntervalCheckTrigger, BigQueryValueCheckTrigger, BigQueryCheckTrigger, BigQueryGetDataTrigger and all operators that use those triggers. 
   
   Please, let me know what option you prefer more, thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #29307: Bigquery Operators in defferable mode fail if the location is not US.
URL: https://github.com/apache/airflow/issues/29307


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1414215349

   @lwyszomi can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VladaZakharova commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "VladaZakharova (via GitHub)" <gi...@apache.org>.
VladaZakharova commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1445885540

   @eladkal Yes, you are right, you can close it. I have placed the solution o the comment and the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VladaZakharova commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "VladaZakharova (via GitHub)" <gi...@apache.org>.
VladaZakharova commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1416002202

   Hi Team! 
   Thank you for detailed description, i will take a look on it and let you know about any updates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1421084833

   I don't think Airflow should workaround bugs/issues of upstream packages.
   
   I would suggest to link here the bug report of upstream repo, users who will hit this issue can find the workaround as you suggested and comment in the upstream package repo to increase awareness for the bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1443939295

   @VladaZakharova so if i'm reading this right we don't have a task on this issue and this one should be fixed upstream? if I think we can close this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1413536222

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VladaZakharova commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "VladaZakharova (via GitHub)" <gi...@apache.org>.
VladaZakharova commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1416002416

   Hi Team! 
   Thank you for detailed description, i will take a look on it and let you know about any updates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VladaZakharova commented on issue #29307: Bigquery Operators in defferable mode fail if the location is not US.

Posted by "VladaZakharova (via GitHub)" <gi...@apache.org>.
VladaZakharova commented on issue #29307:
URL: https://github.com/apache/airflow/issues/29307#issuecomment-1422174770

   Okay, sure. I have updated previous comment with the link to created issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org