You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dian Fu (Jira)" <ji...@apache.org> on 2020/04/09 09:05:00 UTC

[jira] [Comment Edited] (FLINK-17062) Fix the conversion from Java row type to Python row type

    [ https://issues.apache.org/jira/browse/FLINK-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079084#comment-17079084 ] 

Dian Fu edited comment on FLINK-17062 at 4/9/20, 9:04 AM:
----------------------------------------------------------

[~f.pompermaier] Thanks a lot for the suggestions!

The conversion here means the conversion between the Java data types and Python data types, not means the conversion between Java objects and Python objects. This is needed because:
 - Python type to Java type: the result type of Python UDF is needed to be converted to Java data type to make sure that it could fit into the existing type system of the table module, e.g. the type inference, etc.
 - Java type to Python type: it's currently only used to retrieve the schema of a Table (via Table.get_schema().get_field_data_types()). For example, users may check the schema of a table.

Regarding to the Python/Java object conversion, you are right and it has already used Arrow as the data exchange format between the Java process and Python process for [vectorized Python UDF|https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink](which takes pandas.Series as the input and output).


was (Author: dian.fu):
[~f.pompermaier] Thanks a lot for the suggestions!

The conversion here means the conversion between the Java data types and Python data types, not means the conversion between Java objects and Python objects. This is needed because:
 - Python type to Java type: the result type of Python UDF is needed to be converted to Java data type to make sure that it could fit into the existing type system of the table module, e.g. the type inference, etc.
 - Java type to Python type: it's currently only used to retrieve the schema of a Table (via Table.get_schema().get_field_data_types()). For example, users may check the schema of a table.

Regarding to the Python/Java object conversion, you are right and it has already used Arrow as the data exchange format between the Java process and Python process for vectorized Python UDF(which takes pandas.Series as the input and output).

> Fix the conversion from Java row type to Python row type
> --------------------------------------------------------
>
>                 Key: FLINK-17062
>                 URL: https://issues.apache.org/jira/browse/FLINK-17062
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>    Affects Versions: 1.9.0
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.3, 1.10.1, 1.11.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It iterate over the result of FieldsDataType.getFieldDataTypes when converting Java row type to Python row type. The result is non-deterministic as the result of FieldsDataType.getFieldDataTypes is of type map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)