You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (JIRA)" <ji...@apache.org> on 2011/04/06 00:52:06 UTC

[jira] [Updated] (HIVE-535) Memory-efficient hash-based Aggregation

     [ https://issues.apache.org/jira/browse/HIVE-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-535:
--------------------------------

    Component/s: Query Processor
         Labels: optimization  (was: )

> Memory-efficient hash-based Aggregation
> ---------------------------------------
>
>                 Key: HIVE-535
>                 URL: https://issues.apache.org/jira/browse/HIVE-535
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>              Labels: optimization
>
> Currently there are a lot of memory overhead in the hash-based aggregation in GroupByOperator.
> The net result is that GroupByOperator won't be able to store many entries in its HashTable, and flushes frequently, and won't be able to achieve very good partial aggregation result.
> Here are some initial thoughts (some of them are from Joydeep long time ago):
> A1. Serialize the key of the HashTable. This will eliminate the 16-byte per-object overhead of Java in keys (depending on how many objects there are in the key, the saving can be substantial).
> A2. Use more memory-efficient hash tables - java.util.HashMap has about 64 bytes of overhead per entry.
> A3. Use primitive array to store aggregation results. Basically, the UDAF should manage the array of aggregation results, so UDAFCount should manage a long[], UDAFAvg should manage a double[] and a long[]. The external code should pass an index to iterate/merge/terminal an aggregation result. This will eliminate the 16-byte per-object overhead of Java.
> More ideas are welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira