You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Yohei Onishi (JIRA)" <ji...@apache.org> on 2018/12/27 00:49:00 UTC

[jira] [Created] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed

Yohei Onishi created AIRFLOW-3571:
-------------------------------------

             Summary: GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
                 Key: AIRFLOW-3571
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3571
             Project: Apache Airflow
          Issue Type: Bug
          Components: contrib
    Affects Versions: 1.10.0
            Reporter: Yohei Onishi


I am using the following service in asia-northeast1-c zone. * GCS: asia-northeast1-c
 * BigQuery dataset and table: asia-northeast1-c
 * Composer: asia-northeast1-c

My task created by GoogleCloudStorageToBigQueryOperator succeeded to uploading CSV file from a GCS bucket to a BigQuery table but the task was failed due to the following error.
 
{code:java}
[2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] {discovery.py:871} INFO - URL being requested: GET https://www.googleapis.com/bigquery/v2/projects/fr-stg-datalake/jobs/job_QQE9TDEu88mfdw_fJHHEo9FtjXja?alt=json
[2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404)
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio
    jobId=self.running_job_id).execute(
  File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe
    return wrapped(*args, **kwargs
  File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut
    raise HttpError(resp, content, uri=self.uri
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json returned "Not found: Job my-project:job_abc123"

During handling of the above exception, another exception occurred

Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas
    result = task_copy.execute(context=context
  File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line 237, in execut
    time_partitioning=self.time_partitioning
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 951, in run_loa
    return self.run_with_configuration(configuration
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1003, in run_with_configuratio
    err.resp.status
Exception: ('BigQuery job status check failed. Final error was: %s', 404
{code}
The task failed to find a job {color:#FF0000}fmy-project:job_abc123{color} but the correct job id is{color:#FF0000} my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, not actual id.)
I suppose the operator does not treat zone properly.
 
{code:java}
$ bq show -j my-project:asia-northeast1:job_abc123
Job my-project:asia-northeast1:job_abc123

Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels
---------- --------- ----------------- ---------- -------------------------------------------------------------- ----------------- -------------- -------------- --------
load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)