You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Florian Wilhelm (JIRA)" <ji...@apache.org> on 2019/04/19 11:19:00 UTC

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

    [ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821849#comment-16821849 ] 

Florian Wilhelm commented on SPARK-21187:
-----------------------------------------

I know that this actually does not help with resolving this issue, but for the time being I wrote up a little workaround how to still use Spark's `pandas_udf` and Arrow with Spark dataframes containing complex types. I hope it's of some use for PySpark users until this issue is fixed. [https://florianwilhelm.info/2019/04/more_efficient_udfs_with_pyspark/]

> Complete support for remaining Spark data types in Arrow Converters
> -------------------------------------------------------------------
>
>                 Key: SPARK-21187
>                 URL: https://issues.apache.org/jira/browse/SPARK-21187
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark, SQL
>    Affects Versions: 2.3.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>
> This is to track adding the remaining type support in Arrow Converters. Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: Struct, -Array-, Arrays of Date/Timestamps, Map
>  * -*Decimal*-
>  * -*Binary*-
> * Categorical when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should we support mulit-indexing?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org