You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aihua Xu (JIRA)" <ji...@apache.org> on 2018/02/08 17:51:01 UTC

[jira] [Comment Edited] (HIVE-18421) Vectorized execution handles overflows in a different manner than non-vectorized execution

    [ https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356455#comment-16356455 ] 

Aihua Xu edited comment on HIVE-18421 at 2/8/18 5:50 PM:
---------------------------------------------------------

[~vihangk1] Sorry for the late reply. I left comment in RB. Basically I don't follow why we need both CHECKED and UNCHECKED implementations. Seems we should only have CHECKED one if UNCHECKED one would generate incorrect result. The user would get incorrect result without notice, right?

Of course, even we want to support UNCHECKED implementation, we should still error out/fail the query if there is overflow so the user knows to set the flag to true. BTW: how much performance impact for this and why (don't exactly follow previous discussion)?


was (Author: aihuaxu):
[~vihangk1] Sorry for the late reply. I left comment in RB. Basically I don't follow why we need both CHECKED and UNCHECKED implementations. Seems we should only have CHECKED one if UNCHECKED one would generate incorrect result. The user would get incorrect result without notice, right?

Of course, even we want to support UNCHECKED implementation, we should error out/fail the query if there is overflow so the user knows to set the flag to true. BTW: how much performance impact for this and why (don't exactly follow previous discussion)?

> Vectorized execution handles overflows in a different manner than non-vectorized execution
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-18421.01.patch, HIVE-18421.02.patch, HIVE-18421.03.patch, HIVE-18421.04.patch, HIVE-18421.05.patch, HIVE-18421.06.patch, HIVE-18421.07.patch
>
>
> In vectorized execution arithmetic operations which cause integer overflows can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)