You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/19 12:39:04 UTC

[GitHub] [airflow] BhuviTheDataGuy opened a new issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

BhuviTheDataGuy opened a new issue #11660:
URL: https://github.com/apache/airflow/issues/11660


   I was using `GoogleCloudStorageToBigQueryOperator` then I wanted to use `GCSToBigQueryOperator`. When I run parallel data export from GCS to BQ, (via a for loop Im generating dynamic task) It is generating the BQ Job name as `test-composer:us-west2.airflow_1603109319` (I think its taking node name + current timestamp) as the job id for all the tasks. 
   
   **Error**
   ```
   ERROR - 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/centili-prod/jobs: Already Exists: Job test-composer:us-west2.airflow_1603109319
   Traceback (most recent call last)
   ```
   This is not allowing to import 2nd table, it has to wait for a min(retry in DAG) then its imported.
   
   But the older one is giving proper Job ID like (Job_someUUID)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BhuviTheDataGuy commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
BhuviTheDataGuy commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723734679


   Even the big query hook has the same issue, Im, not a coder :) so not able to find the exact cause. 
   
   When we use `contrib.bigquery` then no issue, this new `providers.bigquery` having this problem on all distributions (like the hook, bigquery operator, gcs to bq, empty table creator, Bq to bq and etc) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-724381558


   Can you  latest composer image `composer-1.13.0-airflow-1.10.12` image and latest backport provider -` apache-airflow-backport-providers-google==2020.10.29`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-724588604


   @muscovitebob 0 I believe indeed it is a dependency problem that is very likely to be addressed in the last version of composer image. From what we know the Composer team keeps the images updated with the releases of Apache Airlfow and the providers and the next image will even include the latest google providers baked, but you should try to install the latest provider there. Note that there is a new version of google provider as a release candidate (voting on it finishes on Thursday) so you might even try to install this version instead 
   
   https://pypi.org/project/apache-airflow-backport-providers-google/2020.11.13rc1/
   
   It has even more fixes:
   
   Commit | Committed | Subject
   -- | -- | --
   b2a28d159 | 2020-11-09 | Moves provider packages scripts to dev (#12082)
   fcb6b00ef | 2020-11-08 | Add authentication to AWS with Google credentials (#12079)
   2ef3b7ef8 | 2020-11-08 | Fix ERROR - Object of type 'bytes' is not JSON serializable when using store_to_xcom_key parameter (#12172)
   0caec9fd3 | 2020-11-06 | Dataflow - add waiting for successful job cancel (#11501)
   cf9437d79 | 2020-11-06 | Simplify string expressions (#12123)
   91a64db50 | 2020-11-04 | Format all files (without excepions) by black (#12091)
   fd3db778e | 2020-11-04 | Add server side cursor support for postgres to GCS operator (#11793)
   f1f194026 | 2020-11-04 | Add DataflowStartSQLQuery operator (#8553)
   41bf172c1 | 2020-11-04 | Simplify string expressions (#12093)
   5f5244b74 | 2020-11-04 | Add template fields renderers to Biguery and Dataproc operators (#12067)
   4e8f9cc8d | 2020-11-03 | Enable Black - Python Auto Formmatter (#9550)
   8c42cf1b0 | 2020-11-03 | Use PyUpgrade to use Python 3.6 features (#11447)
   45ae145c2 | 2020-11-03 | Log BigQuery job id in insert method of BigQueryHook (#12056)
   e324b37a6 | 2020-11-03 | Add job name and progress logs to Cloud Storage Transfer Hook (#12014)
   6071fdd58 | 2020-11-02 | Improve handling server errors in DataprocSubmitJobOperator (#11947)
   2f703df12 | 2020-10-30 | Add SalesforceToGcsOperator (#10760)
   e5713e00b | 2020-10-29 | Add drain option when canceling Dataflow pipelines (#11374)
   37eaac3c5 | 2020-10-29 | The PRs which are not approved run subset of tests (#11828)
   79cb77199 | 2020-10-28 | Fixing re pattern and changing to use a single character class. (#11857)
   5a439e84e | 2020-10-26 | Prepare providers release 0.0.2a1 (#11855)
   240c7d4d7 | 2020-10-26 | Google Memcached hooks - improve protobuf messages handling (#11743)
   8afdb6ac6 | 2020-10-26 | Fix spellings (#11825)
   872b1566a | 2020-10-25 | Generated backport providers readmes/setup for 2020.10.29 (#11826)
   b680bbc0b | 2020-10-24 | Generated backport providers readmes/setup for 2020.10.29
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-724588604


   @muscovitebob -> I believe indeed it is a dependency problem that is very likely to be addressed in the last version of composer image. From what we know the Composer team keeps the images updated with the releases of Apache Airlfow and the providers and the next image will even include the latest google providers baked, in but you should try to install the latest provider there. Note that there is a new version of google provider as release candidate (voting on it finishes on Thursday) so you might even try to install this version instead 
   
   https://pypi.org/project/apache-airflow-backport-providers-google/2020.11.13rc1/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jensenity commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
jensenity commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723727680


   I'm using `apache-airflow-backport-providers-google==2020.10.5` and seem to get similar error as well. 
   
   ```
   [2020-11-08 23:31:58,200] {taskinstance.py:1150} ERROR - 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/banksalad/jobs: Already Exists: Job test:US.airflow_1604878317
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BhuviTheDataGuy commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
BhuviTheDataGuy commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-714340956


   Yeah, maybe a bug, try the previous version(GoogleCloudStorageToBigQueryOperator). It works. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] muscovitebob commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
muscovitebob commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-714337338


   I just started using this operator via the backports package a few days ago and I hit this at least one in ten times I invoke the operator, making it unusable without manual supervision. I do not use a dynamic dag but I do have a few `GCSToBigQueryOperator`s.
   
   ```
   [2020-10-22 08:40:58,732] {base_task_runner.py:113} INFO - Job 59262: Subtask TASKNAME [2020-10-22 08:40:58,730] {taskinstance.py:1135} ERROR - 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/PROJECTNAME/jobs: Already Exists: Job PROJECTNAME:EU.airflow_1603356058@-@{"workflow": "DAGNAME", "task-id": "TASKNAME", "execution-date": "2020-10-22T08:29:42.258293+00:00"}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] muscovitebob commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
muscovitebob commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-724996955


   > Can you latest composer image `composer-1.13.0-airflow-1.10.12` image and latest backport provider -` apache-airflow-backport-providers-google==2020.10.29`?
   
   Upgrade with these went well, thanks much! Did not realise there was a new release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BhuviTheDataGuy closed issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
BhuviTheDataGuy closed issue #11660:
URL: https://github.com/apache/airflow/issues/11660


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BhuviTheDataGuy commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
BhuviTheDataGuy commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723760157


   Awesome!!! So im closing this now.
   
   https://github.com/apache/airflow/issues/11282


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-724588604


   @muscovitebob 0 I believe indeed it is a dependency problem that is very likely to be addressed in the last version of composer image. From what we know the Composer team keeps the images updated with the releases of Apache Airlfow and the providers and the next image will even include the latest google providers baked, but you should try to install the latest provider there. Note that there is a new version of google backport provider as a release candidate (voting on it finishes on Thursday) so you might even try to install this version instead 
   
   https://pypi.org/project/apache-airflow-backport-providers-google/2020.11.13rc1/
   
   It has even more fixes:
   
   Commit | Committed | Subject
   -- | -- | --
   b2a28d159 | 2020-11-09 | Moves provider packages scripts to dev (#12082)
   fcb6b00ef | 2020-11-08 | Add authentication to AWS with Google credentials (#12079)
   2ef3b7ef8 | 2020-11-08 | Fix ERROR - Object of type 'bytes' is not JSON serializable when using store_to_xcom_key parameter (#12172)
   0caec9fd3 | 2020-11-06 | Dataflow - add waiting for successful job cancel (#11501)
   cf9437d79 | 2020-11-06 | Simplify string expressions (#12123)
   91a64db50 | 2020-11-04 | Format all files (without excepions) by black (#12091)
   fd3db778e | 2020-11-04 | Add server side cursor support for postgres to GCS operator (#11793)
   f1f194026 | 2020-11-04 | Add DataflowStartSQLQuery operator (#8553)
   41bf172c1 | 2020-11-04 | Simplify string expressions (#12093)
   5f5244b74 | 2020-11-04 | Add template fields renderers to Biguery and Dataproc operators (#12067)
   4e8f9cc8d | 2020-11-03 | Enable Black - Python Auto Formmatter (#9550)
   8c42cf1b0 | 2020-11-03 | Use PyUpgrade to use Python 3.6 features (#11447)
   45ae145c2 | 2020-11-03 | Log BigQuery job id in insert method of BigQueryHook (#12056)
   e324b37a6 | 2020-11-03 | Add job name and progress logs to Cloud Storage Transfer Hook (#12014)
   6071fdd58 | 2020-11-02 | Improve handling server errors in DataprocSubmitJobOperator (#11947)
   2f703df12 | 2020-10-30 | Add SalesforceToGcsOperator (#10760)
   e5713e00b | 2020-10-29 | Add drain option when canceling Dataflow pipelines (#11374)
   37eaac3c5 | 2020-10-29 | The PRs which are not approved run subset of tests (#11828)
   79cb77199 | 2020-10-28 | Fixing re pattern and changing to use a single character class. (#11857)
   5a439e84e | 2020-10-26 | Prepare providers release 0.0.2a1 (#11855)
   240c7d4d7 | 2020-10-26 | Google Memcached hooks - improve protobuf messages handling (#11743)
   8afdb6ac6 | 2020-10-26 | Fix spellings (#11825)
   872b1566a | 2020-10-25 | Generated backport providers readmes/setup for 2020.10.29 (#11826)
   b680bbc0b | 2020-10-24 | Generated backport providers readmes/setup for 2020.10.29
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jensenity commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
jensenity commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723758959


   apparently they upgraded and fixed it at airflow backport 2020.10.29 version. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jensenity commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
jensenity commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723759055


   https://github.com/apache/airflow/commit/47b05a87f004dc273a4757ba49f03808a86f77e7


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] muscovitebob commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
muscovitebob commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-723876029


   For Cloud Composer users like me, trying to use `2020.10.29` with the `composer-1.12.5-airflow-1.10.10` image will error out with a very cryptic:
   ```
   The command '/bin/sh -c bash installer.sh $COMPOSER_PYTHON_VERSION  fail' returned a non-zero code: 1
   ERROR
   ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1
   ```
   In the cloud build step. The warning from earlier in the log may provide some hints as to why this happens:
   
   ```
   + python3 -m pipdeptree --warn fail
   Warning!!! Possibly conflicting dependencies found:
   * google-cloud-memcache==0.2.0
    - google-api-core [required: >=1.17.0,<2.0.0dev, installed: 1.16.0]
   ```
   
   So for now at least Cloud Composer users are stuck with using either the old non-buggy Operator or waiting for Google to patch this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] muscovitebob commented on issue #11660: GCSToBigQueryOperator - Not generating the Unique BQ Job Name

Posted by GitBox <gi...@apache.org>.
muscovitebob commented on issue #11660:
URL: https://github.com/apache/airflow/issues/11660#issuecomment-714344018


   [This](https://github.com/apache/airflow/blob/1da8379c913843834353b44861c62f332a461bdf/airflow/providers/google/cloud/hooks/bigquery.py#L1442) seems to be the method that incorrectly generates the job IDs but the format does not seem to match exactly what I get in the logs.
   Meanwhile I will indeed switch to the older operator since you report it always generates job IDs correctly, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org