You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (Jira)" <ji...@apache.org> on 2019/09/03 14:25:00 UTC

[jira] [Commented] (SPARK-28610) Support larger buffer for sum of long

    [ https://issues.apache.org/jira/browse/SPARK-28610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921459#comment-16921459 ] 

Gengliang Wang commented on SPARK-28610:
----------------------------------------

Hi [~mgaido]
I tried the following SQL on PostgreSQL and got error "integer out of range"
postgres=# select sum(1+2147483647-1);

> Support larger buffer for sum of long
> -------------------------------------
>
>                 Key: SPARK-28610
>                 URL: https://issues.apache.org/jira/browse/SPARK-28610
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Marco Gaido
>            Priority: Major
>
> The sum of a long field currently uses a buffer of type long.
> When the flag for throwing exceptions on overflow for arithmetic operations in turned on, this is a problem in case there are intermediate overflows which are then resolved by other rows. Indeed, in such a case, we are throwing an exception, while the result is representable in a long value. An example of this issue can be seen running:
> {code}
> val df = sc.parallelize(Seq(100L, Long.MaxValue, -1000L)).toDF("a")
> df.select(sum($"a")).show()
> {code}
> According to [~cloud_fan]'s suggestion in https://github.com/apache/spark/pull/21599, we should introduce a flag in order to let users choose among a wider datatype for the sum buffer using a config, so that the above issue can be fixed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org