You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/03 18:54:53 UTC

[GitHub] [spark] skambha commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

skambha commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow
URL: https://github.com/apache/spark/pull/27627#issuecomment-594110207
 
 
   > > In case of the whole stage codegen, you can see the decimal ‘+’ expressions will at some point be larger than what can be contained in dec 38,18 but it gets written out as null. This messes up the end result of the sum and you get wrong results.
   > 
   > Still a bit confused. If it gets written out as null, then null + decimal always return null and the final result is null?
   
   So if we look into the aggregate Sum, we have **coalesce** in  updateExpressions and mergeExpressions, so it is not purely only a null + decimal only expression.   For e.g, in updateExpressions if the intermediate sum becomes null because of overflow, then in the next iteration of the updateExpressions, coalesce will be used to make the sum become 0 for that and the sum will be updated to 0 + decimal.   
   https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L76
   ```
    override lazy val updateExpressions: Seq[Expression]
   ...
           /* sum = */
           coalesce(coalesce(sum, zero) + child.cast(sumDataType), sum)
   ```
   Please let me know if this helps. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org