You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Hongyi Wang (Jira)" <ji...@apache.org> on 2020/04/01 19:35:00 UTC

[jira] [Created] (AIRFLOW-7118) Honor schema type for mysql to gcs data pre-process

Hongyi Wang created AIRFLOW-7118:
------------------------------------

             Summary: Honor schema type for mysql to gcs data pre-process
                 Key: AIRFLOW-7118
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7118
             Project: Apache Airflow
          Issue Type: Improvement
          Components: operators
    Affects Versions: 1.10.9, 1.10.8, 1.10.7, 1.10.6, 1.10.5
            Reporter: Hongyi Wang
            Assignee: Hongyi Wang


After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery) schema with customize [self.schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65](if set). However, we still pre-process sql value based on default schema rather than [override schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L112-L113]. 

For example, [mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to (bigquery)`TIMESTAMP`|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L49]. Suppose we want to load such field to BigQuery as `DATE`, we would provide [our own schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65]. Unfortunately, since [convert_type()|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L102] still pre-process sql value based on default schema, (python)`date` will be converted timestamp|[https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]]. As a result, BigQuery API returns error:
{code:java}
Could not convert non-string JSON value to DATE type{code}
To fix that, we need to update [convert_type()|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L103] to always honor `schema_type` for data pre-preprocess. e.g.
{code:java}
if schema_type == "DATE" and isinstance(value, date):
    return value.isoformat(){code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)