You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2018/10/07 04:33:00 UTC

[jira] [Comment Edited] (HIVE-17043) Remove non unique columns from group by keys if not referenced later

    [ https://issues.apache.org/jira/browse/HIVE-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640968#comment-16640968 ] 

Jesus Camacho Rodriguez edited comment on HIVE-17043 at 10/7/18 4:32 AM:
-------------------------------------------------------------------------

[~vgarg], latest patch seems to have unrelated changes: {{VectorizedOrcAcidRowBatchReader}} and {{TestVectorizedOrcAcidRowBatchReader}}?


was (Author: jcamachorodriguez):
[~vgarg], latest patch seems to have unrelated changes: {{VectorizedOrcAcidRowBatchReader}}.

> Remove non unique columns from group by keys if not referenced later
> --------------------------------------------------------------------
>
>                 Key: HIVE-17043
>                 URL: https://issues.apache.org/jira/browse/HIVE-17043
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Logical Optimizer
>    Affects Versions: 3.0.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Vineet Garg
>            Priority: Major
>         Attachments: HIVE-17043.1.patch, HIVE-17043.10.patch, HIVE-17043.11.patch, HIVE-17043.12.patch, HIVE-17043.13.patch, HIVE-17043.2.patch, HIVE-17043.3.patch, HIVE-17043.4.patch, HIVE-17043.5.patch, HIVE-17043.6.patch, HIVE-17043.7.patch, HIVE-17043.8.patch, HIVE-17043.9.patch
>
>
> Group by keys may be a mix of unique (or primary) keys and regular columns. In such cases presence of regular column won't alter cardinality of groups. So, if regular columns are not referenced later, they can be dropped from group by keys. Depending on operator tree may result in those columns not being read at all from disk in best case. In worst case, we will avoid shuffling and sorting regular columns from mapper to reducer, which still could be substantial CPU and network savings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)