You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2017/10/23 21:12:00 UTC

[jira] [Commented] (AIRFLOW-1750) GoogleCloudStorageToBigQueryOperator 404 HttpError

    [ https://issues.apache.org/jira/browse/AIRFLOW-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215848#comment-16215848 ] 

Chris Riccomini commented on AIRFLOW-1750:
------------------------------------------

It looks to me like the project id is not being properly set. Have you checked your hook definition, service account, etc? The URL listed in the stack trace has two slashes after `projects`, indicating that no project_id was set.

> GoogleCloudStorageToBigQueryOperator 404 HttpError
> --------------------------------------------------
>
>                 Key: AIRFLOW-1750
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1750
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: Airflow 1.8
>         Environment: Python 2.7.13
>            Reporter: Mark Secada
>             Fix For: Airflow 1.8
>
>
> I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and then moves them to BigQuery. I was able to upload these files to GoogleCloudStorage, but when I run this second task, I get a 404 HttpError. The error looks like this:
> {code:bash}
> ERROR - <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json returned "Not Found">
> Traceback (most recent call last):
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
>     result = task_copy.execute(context=context)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py", line 153, in execute
>     schema_update_options=self.schema_update_options)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 476, in run_load
>     return self.run_with_configuration(configuration)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 498, in run_with_configuration
>     .insert(projectId=self.project_id, body=job_data) \
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/oauth2client/util.py", line 135, in positional_wrapper
>     return wrapped(*args, **kwargs)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/googleapiclient/http.py", line 838, in execute
>     raise HttpError(resp, content, uri=self.uri)
> {code}
> My code for the task is here:
> {code:python}
> // Some comments here
> t3 = GoogleCloudStorageToBigQueryOperator(
>         task_id='move_'+source+'_from_gcs_to_bq',
>         bucket='mybucket',
>         source_objects=['news/latest_headline_'+source+'.json'],
>         destination_project_dataset_table='mydataset.latest_news_headlines',
>         schema_object='news/latest_headline_'+source+'.json',
>         source_format='NEWLINE_DELIMITED_JSON',
>         write_disposition='WRITE_APPEND'
>         dag=dag)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)