You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Amruth S (JIRA)" <ji...@apache.org> on 2016/09/14 05:11:20 UTC

[jira] [Commented] (HIVE-14741) Incorrect results on boolean col when vectorization is ON

    [ https://issues.apache.org/jira/browse/HIVE-14741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489441#comment-15489441 ] 

Amruth S commented on HIVE-14741:
---------------------------------

Issue probably is not related to boolean. I think its related to usage of case/IF with primitives that has nulls

Some more observations --

select sum(if((bool_col), 1, 0)) from bool_vect_issue;
708206

select sum(if((bool_col == True), 1, 0)) from bool_vect_issue;
697966

select sum(if((bool_col is null), 1, 0)) from bool_vect_issue;
868512

select sum(if(coalesce(bool_col,false)), 1, 0)) from bool_vect_issue;
231

select a.x, count(*) from (select bool_col as x from bool_vect_issue) a group by a.x;
NULL	868512
true	231

select a.x, count(*) from (select if(bool_col, true, false) x from bool_vect_issue) a group by a.x;
false	160537
true	708206

> Incorrect results on boolean col when vectorization is ON
> ---------------------------------------------------------
>
>                 Key: HIVE-14741
>                 URL: https://issues.apache.org/jira/browse/HIVE-14741
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Amruth S
>              Labels: orc, vectorization
>         Attachments: 000000_0
>
>
> I have attached the ORC part file on which the issue is manifesting. It has just one boolean column (lot of nulls, 231=trues : verified using orc file dump utility)
> 1) Create external table on the part file attached
> CREATE EXTERNAL TABLE bool_vect_issue (
> `bool_col` BOOLEAN)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
> '<loc to which the part file is copied>';
> 2) 
> set hive.vectorized.execution.enabled = true;
> select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
> gives
> 708206
> 3) 
> set hive.vectorized.execution.enabled = false;
> select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
> gives
> 231
> The issue seem to have the same impact as HIVE-12435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)