You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:00:37 UTC

[jira] [Updated] (SPARK-21465) array('L') support might lead to overflow error

     [ https://issues.apache.org/jira/browse/SPARK-21465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-21465:
---------------------------------
    Labels: bulk-closed  (was: )

> array('L') support might lead to overflow error
> -----------------------------------------------
>
>                 Key: SPARK-21465
>                 URL: https://issues.apache.org/jira/browse/SPARK-21465
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>            Reporter: Xiang Gao
>            Priority: Major
>              Labels: bulk-closed
>
> For now, the behavior of different types of {{array.array}} support in pyspark is not clearly defined.
> As a result, in python 3, trying to create {{DataFrame}} of {{array('L')}} would give get an exception, while in python 2, the same code would not raise an exception but converting 'L' to a smaller integer instead. This behavior in python 2 might lead to overflow error if the input data is large enough.
> To avoid this unexpected behavior, we should throw an exception in python 2 for {{array('L')}} telling the user it is not supported, or support it using larger data types in JVM like BigInt.
> See discussions starting from https://github.com/apache/spark/pull/18444#discussion_r128132584



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org