You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/01/29 06:20:00 UTC

[jira] [Resolved] (SPARK-26566) Upgrade apache/arrow to 0.12.0

     [ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-26566.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 23657
[https://github.com/apache/spark/pull/23657]

> Upgrade apache/arrow to 0.12.0
> ------------------------------
>
>                 Key: SPARK-26566
>                 URL: https://issues.apache.org/jira/browse/SPARK-26566
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users:
> * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
> * Java, Reduce heap usage for variable width vectors, ARROW-4147
> * Binary identity cast not implemented, ARROW-4101
> * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
> * conversion to date object no longer needed, ARROW-3910
> * Error reading IPC file with no record batches, ARROW-3894
> * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790
> * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428
> * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048
> * Java update to official Flatbuffers version 1.9.0, ARROW-3175
> complete list [here|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0]
> PySpark requires the following fixes to work with PyArrow 0.12.0
> * Encrypted pyspark worker fails due to ChunkedStream missing closed property
> * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64
> * ArrowTests fails due to difference in raised error message
> * pyarrow.open_stream deprecated
> * tests fail because groupby adds index column with duplicate name
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org