You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:00:37 UTC
[jira] [Updated] (SPARK-21465) array('L') support might lead to
overflow error
[ https://issues.apache.org/jira/browse/SPARK-21465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-21465:
---------------------------------
Labels: bulk-closed (was: )
> array('L') support might lead to overflow error
> -----------------------------------------------
>
> Key: SPARK-21465
> URL: https://issues.apache.org/jira/browse/SPARK-21465
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.2.0
> Reporter: Xiang Gao
> Priority: Major
> Labels: bulk-closed
>
> For now, the behavior of different types of {{array.array}} support in pyspark is not clearly defined.
> As a result, in python 3, trying to create {{DataFrame}} of {{array('L')}} would give get an exception, while in python 2, the same code would not raise an exception but converting 'L' to a smaller integer instead. This behavior in python 2 might lead to overflow error if the input data is large enough.
> To avoid this unexpected behavior, we should throw an exception in python 2 for {{array('L')}} telling the user it is not supported, or support it using larger data types in JVM like BigInt.
> See discussions starting from https://github.com/apache/spark/pull/18444#discussion_r128132584
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org