You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Bryan Cutler (Jira)" <ji...@apache.org> on 2019/10/16 16:41:00 UTC

[jira] [Commented] (ARROW-3850) [Python] Support MapType and StructType for enhanced PySpark integration

    [ https://issues.apache.org/jira/browse/ARROW-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952988#comment-16952988 ] 

Bryan Cutler commented on ARROW-3850:
-------------------------------------

I made ARROW-6904 to add MapArray to Arrow Python, once that is done it can be implemented in PySpark and we can close this once it passes the Spark integration tests. Nested structs require some other issues to be worked out, and there are other JIRAs for that.

> [Python] Support MapType and StructType for enhanced PySpark integration
> ------------------------------------------------------------------------
>
>                 Key: ARROW-3850
>                 URL: https://issues.apache.org/jira/browse/ARROW-3850
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>    Affects Versions: 0.11.1
>            Reporter: Florian Wilhelm
>            Priority: Major
>             Fix For: 1.0.0
>
>
> It would be great to support MapType and (nested) StructType in Arrow so that PySpark can make use of it.
>  
>  Quite often as in my use-case in Hive table cells are also complex types saved. Currently it's not possible to user the new {{[pandas_udf|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode#pyspark.sql.functions.pandas_udf]}} decorator which internally uses Arrow to generate a UDF for columns with complex types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)