You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Karuppayya (Jira)" <ji...@apache.org> on 2020/06/12 00:17:00 UTC

[jira] [Created] (SPARK-31973) Add ability to disable Sort,Spill in Partial aggregation

Karuppayya created SPARK-31973:
----------------------------------

             Summary: Add ability to disable Sort,Spill in Partial aggregation 
                 Key: SPARK-31973
                 URL: https://issues.apache.org/jira/browse/SPARK-31973
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Karuppayya


In case of HashAggregation, a partial aggregation(update) is done followed by final aggregation(merge) 

During partial aggregation we sort and spill to disk everytime, when the fast Map(when enabled) and  UnsafeFixedWidthAggregationMap gets exhausted

*When the cardinality of grouping column is close to the total number of records being processed*, the sorting of data spilling to disk is not required, since it is kind of no-op and we can directly use in Final aggregation.

When the user is aware of nature of data, currently he has no control over disabling this sort, spill operation.

This is similar to following issue in Hive:

https://issues.apache.org/jira/browse/HIVE-223

https://issues.apache.org/jira/browse/HIVE-291

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org