You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Gaido (JIRA)" <ji...@apache.org> on 2019/08/03 08:08:00 UTC

[jira] [Created] (SPARK-28610) Support larger buffer for sum of long

Marco Gaido created SPARK-28610:
-----------------------------------

             Summary: Support larger buffer for sum of long
                 Key: SPARK-28610
                 URL: https://issues.apache.org/jira/browse/SPARK-28610
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Marco Gaido


The sum of a long field currently uses a buffer of type long.

When the flag for throwing exceptions on overflow for arithmetic operations in turned on, this is a problem in case there are intermediate overflows which are then resolved by other rows. Indeed, in such a case, we are throwing an exception, while the result is representable in a long value. An example of this issue can be seen running:

{code}
val df = sc.parallelize(Seq(100L, Long.MaxValue, -1000L)).toDF("a")
df.select(sum($"a")).show()
{code}

According to [~cloud_fan]'s suggestion in https://github.com/apache/spark/pull/21599, we should introduce a flag in order to let users choose among a wider datatype for the sum buffer using a config, so that the above issue can be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org