You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Yun Xu (JIRA)" <ji...@apache.org> on 2019/07/26 19:00:00 UTC

[jira] [Created] (AIRFLOW-5053) Add support for configuring under-the-hood csv writer in MySqlToHiveTransfer Operator

Yun Xu created AIRFLOW-5053:
-------------------------------

             Summary: Add support for configuring under-the-hood csv writer in MySqlToHiveTransfer Operator
                 Key: AIRFLOW-5053
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5053
             Project: Apache Airflow
          Issue Type: Bug
          Components: operators
    Affects Versions: 1.10.3
            Reporter: Yun Xu


[https://github.com/apache/airflow/blob/master/airflow/operators/mysql_to_hive.py#L125]

MySqlToHiveTransfer uses csv.wirter under the hood, however, when MySql table includes Json columns, it'll by default double quotes when encountering csv quotechar or special char, but it doesn't transfer it back ("" double quotes remains) when loading to Hive tables, causing invalid Json payload in hive columns.



e.g. '["true"]' (MySql) => '[""true""]' (Hive, invalid json payload)

In our case, we fixed it by creating our own customized MySqlToHiveTransfer Operator by overriding the original class's execute method, basically replacing the csv writer with our own configs.
{code:java}
// configure csv_writer for supporting json columns
csv_writer = csv.writer(f, delimiter=self.delimiter,
		quoting=csv.QUOTE_NONE,
		quotechar='',
		escapechar='@',
		encoding="utf-8")
{code}
It'd be good if we could at least expose those csv configs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)