You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/16 05:56:31 UTC

[GitHub] [airflow] QuinRiva opened a new issue #11568: GCSToBigQueryOperator fails: job Id is not unique

QuinRiva opened a new issue #11568:
URL: https://github.com/apache/airflow/issues/11568


   **Apache Airflow version**: 1.10.12
   
   **Environment**:  Centos 7 Host
   
   - **Cloud provider or hardware configuration**: 
   - **OS** (e.g. from /etc/os-release): Ubuntu 20.04 Docker Container
   - **Kernel**: Linux c6c6e8230c17 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
   - **Install tools**: Docker Compose
   - **Others** ?
   
   **What happened**:
   
   BigQuery job not created because it already exists:
   `ERROR - 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/project_id/jobs?prettyPrint=false: Already Exists: Job project_id:US.airflow_1602826997`
   
   **What you expected to happen**:
   It appears that the job ID takes the following form: *<project_id>:<locale>airflow<epoch_timestamp>*
   
   This causes issues if the GCSToBigQuery operator is launched in parallel or in a loop, because multiple jobs can be started within one second of each other.
   
   A solution would be to append either a UUID to the BigQuery Job Id, or the task_id.
   
   **How to reproduce it**:
   ```
   tables = ["table_1", "table_2", "table_3"]
   
   with DAG(dag_id="dag_name", default_args=args, is_paused_upon_creation=False) as dag:
       for table in tables:
           load_op = GCSToBigQueryOperator(
               task_id=f"load_{table}",
               bucket="my_bucket",
               source_objects=[f"{table}_data_*.json"],
               destination_project_dataset_table=f"{DATASET_NAME}.{table}",
               schema_object=f"{table}_schema.json",
               write_disposition="WRITE_TRUNCATE",
               source_format="NEWLINE_DELIMITED_JSON",
               dag=dag,
           )
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] QuinRiva closed issue #11568: GCSToBigQueryOperator fails: job Id is not unique

Posted by GitBox <gi...@apache.org>.
QuinRiva closed issue #11568:
URL: https://github.com/apache/airflow/issues/11568


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org