You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2018/02/03 02:49:00 UTC

[jira] [Commented] (HIVE-18421) Vectorized execution handles overflows in a different manner than non-vectorized execution

    [ https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351212#comment-16351212 ] 

Vihang Karajgaonkar commented on HIVE-18421:
--------------------------------------------

Updated the Jira description appropriately. This patch doesn't add code to handle overflows in a "right" manner. It only makes vectorized code execution handle overflow in a "consistent" manner when compared to non-vectorized execution.

I feel that handling overflows according to a configurable policy (wraparound, warn, error, set to null) would be a good addition and it can be taken up as a separate effort.

Attaching the first version of this patch. I reviewed almost all the vector expressions (phew ..;))  and found a subset of expressions which do not handle overflows in a consistent manner. As discussed above, it introduces a new config which enables usage of checked expressions when available.

> Vectorized execution handles overflows in a different manner than non-vectorized execution
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-18421.01.patch
>
>
> In vectorized execution arithmetic operations which cause integer overflows can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)