You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2017/08/03 05:02:00 UTC
[jira] [Comment Edited] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112186#comment-16112186 ]
Matt McCline edited comment on HIVE-12369 at 8/3/17 5:01 AM:
-------------------------------------------------------------
Yes, I think you should continue reviewing. The path that is implemented is One Long Key and groupByMode == HASH. There are UNDONEs for *subsequent* JIRAs that later adds Aggregation of non-Long data types, Fixed Length Keys / Variable Length Keys, and the other groupByModes. And later adds Grouping Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate key elimination), too.
was (Author: mmccline):
Yes, I think you continue reviewing. The path that is implemented is One Long Key and groupByMode == HASH. There are UNDONEs for *subsequent* JIRAs that later adds Aggregation of non-Long data types, Fixed Length Keys / Variable Length Keys, and the other groupByModes. And later adds Grouping Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate key elimination), too.
> Native Vector GroupBy
> ---------------------
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't allocate hash elements:
> {noformat}
> COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator
> COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
> COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
> VectorGroupByHashOneLongKeyOperator
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)