You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/11/27 18:57:03 UTC

[jira] [Commented] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

    [ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267253#comment-16267253 ] 

ASF subversion and git services commented on AIRFLOW-1613:
----------------------------------------------------------

Commit 2f79610a3ef726e88dec238de000d9295ae7d2a9 in incubator-airflow's branch refs/heads/master from Devon Peticolas
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2f79610 ]

[AIRFLOW-1613] make mysql_to_gcs_operator py3 compatible

Uses `__future__.unicode_literals` and replaces calling `json.dumps`
with `json.dump` followed by `tmp_file_handle.write` to write json lines
to the ndjson file. When using python3, `json.dump` will return a
unicode string instead of a byte string, therefore we encode the unicode
string to `utf-8` which is compatible with bigquery (see:
https://cloud.google.com/bigquery/docs/loading-data#loading_encoded_data).


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1613
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib
>            Reporter: Joy Gao
>            Assignee: Joy Gao
>             Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over once. 
> Therefore the current implementation will return an empty list after the first iteration of schema:
> {code}
>         schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
>         file_no = 0
>         tmp_file_handle = NamedTemporaryFile(delete=True)
>         tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
>         for row in cursor:
>             # Convert datetime objects to utc seconds, and decimals to floats
>             row = map(self.convert_types, row)
>             row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should support uploading binary columns from mysql to cloud storage as it's a pretty common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)