You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jack (JIRA)" <ji...@apache.org> on 2018/12/30 07:20:00 UTC
[jira] [Commented] (AIRFLOW-3316) GCS to BQ operator leaves
schema_fields operator unset when autodetect=True
[ https://issues.apache.org/jira/browse/AIRFLOW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730901#comment-16730901 ]
jack commented on AIRFLOW-3316:
-------------------------------
I'm unable to reproduce this issue.
first, {color:#24292e}schema_fields is optional field. You don't need to assign None. If there is no schema then don't specify this field.{color}
{color:#24292e}second, even if you specified schema_fields = None it doesn't matter as this is the default value of schema_fields.{color}
{color:#24292e}The block of {color}
{code:java}
if not self.schema_fields:{code}
is there in cases that schema_fields need to be overwrite after this block either it will have a value or it will be None.
{color:#24292e}Please provide your DAG for us to test. {color}
> GCS to BQ operator leaves schema_fields operator unset when autodetect=True
> ---------------------------------------------------------------------------
>
> Key: AIRFLOW-3316
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3316
> Project: Apache Airflow
> Issue Type: Bug
> Components: operators
> Affects Versions: 1.10.1
> Reporter: Conrad Lee
> Assignee: Conrad Lee
> Priority: Minor
>
> When I use the GoogleCloudStorageToBigQueryOperator to load data from Parquet into BigQuery, I leave the schema_fields argument set to 'None' and set autodetect=True.
>
> This causes the following error:
>
> {code:java}
> [2018-11-08 09:42:03,690] {models.py:1736} ERROR - local variable 'schema_fields' referenced before assignment
> Traceback (most recent call last)
> File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas
> result = task_copy.execute(context=context
> File "/home/airflow/gcs/plugins/bq_operator_updated.py", line 2018, in execut
> schema_fields=schema_fields
> UnboundLocalError: local variable 'schema_fields' referenced before assignmen
> {code}
>
> The problem is this set of checks in which the schema_fields variable is set neglects to cover all the cases
> {code:java}
> if not self.schema_fields:
> if self.schema_object and self.source_format != 'DATASTORE_BACKUP':
> gcs_hook = GoogleCloudStorageHook(
> google_cloud_storage_conn_id=self.google_cloud_storage_conn_id,
> delegate_to=self.delegate_to)
> schema_fields = json.loads(gcs_hook.download(
> self.bucket,
> self.schema_object).decode("utf-8"))
> elif self.schema_object is None and self.autodetect is False:
> raise ValueError('At least one of `schema_fields`, `schema_object`, '
> 'or `autodetect` must be passed.')
> else:
> schema_fields = self.schema_fields
> {code}
> After the `elif` we need to handle the case where autodetect is set to True. This can be done by simply adding two lines:
> {code:java}
> if not self.schema_fields:
> if self.schema_object and self.source_format != 'DATASTORE_BACKUP':
> gcs_hook = GoogleCloudStorageHook(
> google_cloud_storage_conn_id=self.google_cloud_storage_conn_id,
> delegate_to=self.delegate_to)
> schema_fields = json.loads(gcs_hook.download(
> self.bucket,
> self.schema_object).decode("utf-8"))
> elif self.schema_object is None and self.autodetect is False:
> raise ValueError('At least one of `schema_fields`, `schema_object`, '
> 'or `autodetect` must be passed.')
> else:
> schema_fiels = None
> else:
> schema_fields = self.schema_fields{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)