You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by zi...@vo.yoo.ro on 2016/07/06 17:10:53 UTC
Help BigQueryToCloudStorageOperator
Hello
I'm trying to use BigQueryToCloudStorageOperator but unfortunately it
doesn't work :
[2016-07-06 16:30:43,496] {discovery.py:810} INFO - URL being
requested: POST
https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
[2016-07-06 16:30:43,496] {client.py:570} INFO - Attempting refresh to
obtain initial access_token
[2016-07-06 16:30:43,496] {client.py:872} INFO - Refreshing
access_token
[2016-07-06 16:30:43,848] {models.py:1286} ERROR - <HttpError 400 when
requesting
https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json returned
"Required parameter is missing">
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python2.7/site-packages/airflow/models.py", line
1242, in run
result = task_copy.execute(context=context)
File
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/operators/bigquery_to_gcs.py",
line 79, in execute
self.print_header)
File
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 261, in run_extract
return self.run_with_configuration(configuration)
File
"/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 438, in run_with_configuration
.insert(projectId=self.project_id, body=job_data) \
File
"/home/airflow/.local/lib/python2.7/site-packages/oauth2client/util.py",
line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File
"/home/airflow/.local/lib/python2.7/site-packages/googleapiclient/http.py",
line 729, in execute
raise HttpError(resp, content, uri=self.uri)
when I print the configuration JSON that is used by
job.insert(projectId=self.project_id, body=job_data) I'm getting :
{'extract': {'compression': 'GZIP', 'fieldDelimiter': ',',
'destinationFormat': 'CSV', 'printHeader': True, 'destinationUris':
u'gs://xxxx/airflow_tmp', 'sourceTable': {'projectId': u'xxxx', 'tableId':
u'xxxx', 'datasetId': u'xxxx'}}}
According to the doc all required parameters are set.
Does anyone has any idea that could help me ? How could I debug this to get
more hints ?
Thanks for your help
Re: Help BigQueryToCloudStorageOperator
Posted by Chris Riccomini <cr...@apache.org>.
Notes from https://cloud.google.com/bigquery/docs/reference/v2/jobs :
1) destinationUris should be a list:
[Pick one] A list of fully-qualified Google Cloud Storage URIs where the
extracted table should be written.
2) This is how we run it:
BigQueryToCloudStorageOperator(
task_id='copy_data_to_gcs',
source_project_dataset_table=results_table,
destination_cloud_storage_uris=[cloud_storage_uri],
dag=dag)
On Wed, Jul 6, 2016 at 10:10 AM, <zi...@vo.yoo.ro> wrote:
> Hello
>
> I'm trying to use BigQueryToCloudStorageOperator but unfortunately it
> doesn't work :
>
> [2016-07-06 16:30:43,496] {discovery.py:810} INFO - URL being
> requested: POST
> https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
> [2016-07-06 16:30:43,496] {client.py:570} INFO - Attempting refresh to
> obtain initial access_token
> [2016-07-06 16:30:43,496] {client.py:872} INFO - Refreshing
> access_token
> [2016-07-06 16:30:43,848] {models.py:1286} ERROR - <HttpError 400 when
> requesting
> https://www.googleapis.com/bigquery/v2/projects/xxxx/jobs?alt=json
> returned "Required parameter is missing">
> Traceback (most recent call last):
> File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/models.py", line
> 1242, in run
> result = task_copy.execute(context=context)
> File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/operators/bigquery_to_gcs.py",
> line 79, in execute
> self.print_header)
> File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
> line 261, in run_extract
> return self.run_with_configuration(configuration)
> File
> "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
> line 438, in run_with_configuration
> .insert(projectId=self.project_id, body=job_data) \
> File
> "/home/airflow/.local/lib/python2.7/site-packages/oauth2client/util.py",
> line 140, in positional_wrapper
> return wrapped(*args, **kwargs)
> File
> "/home/airflow/.local/lib/python2.7/site-packages/googleapiclient/http.py",
> line 729, in execute
> raise HttpError(resp, content, uri=self.uri)
>
>
>
> when I print the configuration JSON that is used by
> job.insert(projectId=self.project_id, body=job_data) I'm getting :
>
> {'extract': {'compression': 'GZIP', 'fieldDelimiter': ',',
> 'destinationFormat': 'CSV', 'printHeader': True, 'destinationUris':
> u'gs://xxxx/airflow_tmp', 'sourceTable': {'projectId': u'xxxx', 'tableId':
> u'xxxx', 'datasetId': u'xxxx'}}}
>
>
> According to the doc all required parameters are set.
>
> Does anyone has any idea that could help me ? How could I debug this to
> get more hints ?
>
> Thanks for your help
>