You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/11/27 18:57:03 UTC
[jira] [Commented] (AIRFLOW-1613) Make
MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267253#comment-16267253 ]
ASF subversion and git services commented on AIRFLOW-1613:
----------------------------------------------------------
Commit 2f79610a3ef726e88dec238de000d9295ae7d2a9 in incubator-airflow's branch refs/heads/master from Devon Peticolas
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2f79610 ]
[AIRFLOW-1613] make mysql_to_gcs_operator py3 compatible
Uses `__future__.unicode_literals` and replaces calling `json.dumps`
with `json.dump` followed by `tmp_file_handle.write` to write json lines
to the ndjson file. When using python3, `json.dump` will return a
unicode string instead of a byte string, therefore we encode the unicode
string to `utf-8` which is compatible with bigquery (see:
https://cloud.google.com/bigquery/docs/loading-data#loading_encoded_data).
> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---------------------------------------------------------------
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
> Issue Type: Bug
> Components: contrib
> Reporter: Joy Gao
> Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1.
> In Python 3, map(...) returns an iterator, which can only be iterated over once.
> Therefore the current implementation will return an empty list after the first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql. We should support uploading binary columns from mysql to cloud storage as it's a pretty common use-case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)