You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Gaido (JIRA)" <ji...@apache.org> on 2019/08/03 08:08:00 UTC
[jira] [Created] (SPARK-28610) Support larger buffer for sum of
long
Marco Gaido created SPARK-28610:
-----------------------------------
Summary: Support larger buffer for sum of long
Key: SPARK-28610
URL: https://issues.apache.org/jira/browse/SPARK-28610
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Marco Gaido
The sum of a long field currently uses a buffer of type long.
When the flag for throwing exceptions on overflow for arithmetic operations in turned on, this is a problem in case there are intermediate overflows which are then resolved by other rows. Indeed, in such a case, we are throwing an exception, while the result is representable in a long value. An example of this issue can be seen running:
{code}
val df = sc.parallelize(Seq(100L, Long.MaxValue, -1000L)).toDF("a")
df.select(sum($"a")).show()
{code}
According to [~cloud_fan]'s suggestion in https://github.com/apache/spark/pull/21599, we should introduce a flag in order to let users choose among a wider datatype for the sum buffer using a config, so that the above issue can be fixed.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org