You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Qifan Pu (JIRA)" <ji...@apache.org> on 2016/07/13 07:50:20 UTC
[jira] [Created] (SPARK-16523) Support Row Based Aggregation
HashMap
Qifan Pu created SPARK-16523:
--------------------------------
Summary: Support Row Based Aggregation HashMap
Key: SPARK-16523
URL: https://issues.apache.org/jira/browse/SPARK-16523
Project: Spark
Issue Type: Story
Components: SQL
Reporter: Qifan Pu
For hash aggregation in Spark SQL, we use a fast aggregation hashmap to act as a "cache" in order to boost aggregation performance. Previously, the hashmap is backed by a `ColumnarBatch`. This has performance issues when we have wide schema for the aggregation table (large number of key fields or value fields).
In this JIRA, we support another implementation of fast hashmap, which is backed by a `RowBatch`. We then automatically pick between the two implementations based on certain knobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org