You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Remus Rusanu (JIRA)" <ji...@apache.org> on 2014/04/09 16:07:14 UTC

[jira] [Updated] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

     [ https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Remus Rusanu updated HIVE-6873:
-------------------------------

    Attachment: HIVE-6873.1.patch

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-6873
>                 URL: https://issues.apache.org/jira/browse/HIVE-6873
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0, 0.14.0
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>         Attachments: HIVE-6873.1.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall aggregate keys the vectorized aggregates do account for the extra key, but they do not process the data correctly for the key. the reduce side the aggregates the input from the vectorized map side to results that are only sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but meantime I'm filing a bug to disable vectorized execution if DISTINCT is present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)