You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by zi...@vo.yoo.ro on 2016/07/06 17:10:53 UTC

Help BigQueryToCloudStorageOperator‏

Hello

I'm trying to use BigQueryToCloudStorageOperator but unfortunately it 
doesn't work :

     [2016-07-06 16:30:43,496] {discovery.py:810} INFO - URL being 
requested: POST 
https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
     [2016-07-06 16:30:43,496] {client.py:570} INFO - Attempting refresh to 
obtain initial access_token
     [2016-07-06 16:30:43,496] {client.py:872} INFO - Refreshing 
access_token
     [2016-07-06 16:30:43,848] {models.py:1286} ERROR - <HttpError 400 when 
requesting 
https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json returned 
"Required parameter is missing">
     Traceback (most recent call last):
       File 
"/home/airflow/.local/lib/python2.7/site-packages/airflow/models.py", line 
1242, in run
         result = task_copy.execute(context=context)
       File 
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/operators/bigquery_to_gcs.py", 
line 79, in execute
         self.print_header)
       File 
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
line 261, in run_extract
         return self.run_with_configuration(configuration)
       File 
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
line 438, in run_with_configuration
         .insert(projectId=self.project_id, body=job_data) \
       File 
"/home/airflow/.local/lib/python2.7/site-packages/oauth2client/util.py", 
line 140, in positional_wrapper
         return wrapped(*args, **kwargs)
       File 
"/home/airflow/.local/lib/python2.7/site-packages/googleapiclient/http.py", 
line 729, in execute
         raise HttpError(resp, content, uri=self.uri)



when I print the configuration JSON that is used by 
job.insert(projectId=self.project_id, body=job_data) I'm getting :

     {'extract': {'compression': 'GZIP', 'fieldDelimiter': ',', 
'destinationFormat': 'CSV', 'printHeader': True, 'destinationUris': 
u'gs://xxxx/airflow_tmp', 'sourceTable': {'projectId': u'xxxx', 'tableId': 
u'xxxx', 'datasetId': u'xxxx'}}}


According to the doc all required parameters are set.

Does anyone has any idea that could help me ? How could I debug this to get 
more hints ?

Thanks for your help

Re: Help BigQueryToCloudStorageOperator‏

Posted by Chris Riccomini <cr...@apache.org>.
Notes from https://cloud.google.com/bigquery/docs/reference/v2/jobs :

1) destinationUris should be a list:

[Pick one] A list of fully-qualified Google Cloud Storage URIs where the
extracted table should be written.

2) This is how we run it:

BigQueryToCloudStorageOperator(
    task_id='copy_data_to_gcs',
    source_project_dataset_table=results_table,
    destination_cloud_storage_uris=[cloud_storage_uri],
    dag=dag)



On Wed, Jul 6, 2016 at 10:10 AM, <zi...@vo.yoo.ro> wrote:

> Hello
>
> I'm trying to use BigQueryToCloudStorageOperator but unfortunately it
> doesn't work :
>
>     [2016-07-06 16:30:43,496] {discovery.py:810} INFO - URL being
> requested: POST
> https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
>     [2016-07-06 16:30:43,496] {client.py:570} INFO - Attempting refresh to
> obtain initial access_token
>     [2016-07-06 16:30:43,496] {client.py:872} INFO - Refreshing
> access_token
>     [2016-07-06 16:30:43,848] {models.py:1286} ERROR - <HttpError 400 when
> requesting
> https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
> returned "Required parameter is missing">
>     Traceback (most recent call last):
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/models.py", line
> 1242, in run
>         result = task_copy.execute(context=context)
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/operators/bigquery_to_gcs.py",
> line 79, in execute
>         self.print_header)
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
> line 261, in run_extract
>         return self.run_with_configuration(configuration)
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
> line 438, in run_with_configuration
>         .insert(projectId=self.project_id, body=job_data) \
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/oauth2client/util.py",
> line 140, in positional_wrapper
>         return wrapped(*args, **kwargs)
>       File
> "/home/airflow/.local/lib/python2.7/site-packages/googleapiclient/http.py",
> line 729, in execute
>         raise HttpError(resp, content, uri=self.uri)
>
>
>
> when I print the configuration JSON that is used by
> job.insert(projectId=self.project_id, body=job_data) I'm getting :
>
>     {'extract': {'compression': 'GZIP', 'fieldDelimiter': ',',
> 'destinationFormat': 'CSV', 'printHeader': True, 'destinationUris':
> u'gs://xxxx/airflow_tmp', 'sourceTable': {'projectId': u'xxxx', 'tableId':
> u'xxxx', 'datasetId': u'xxxx'}}}
>
>
> According to the doc all required parameters are set.
>
> Does anyone has any idea that could help me ? How could I debug this to
> get more hints ?
>
> Thanks for your help
>