You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Tony Hill (JIRA)" <ji...@apache.org> on 2017/05/12 14:01:04 UTC

[jira] [Created] (HIVE-16661) Parquet storage does not handle 'or' statement properly

Tony Hill created HIVE-16661:
--------------------------------

             Summary: Parquet storage does not handle 'or' statement properly
                 Key: HIVE-16661
                 URL: https://issues.apache.org/jira/browse/HIVE-16661
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.1.0
            Reporter: Tony Hill


Query on a parquet backed table returns different results based on value of hive.optimize.ppd.storage.

Steps to reproduce:

CREATE TABLE `test_table`(
`some_value` int)
PARTITIONED BY (
`date` string,
`id` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';


set hive.exec.dynamic.partition.mode=nonstrict;

insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, '2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, '2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32);


SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = '2017-04-09' AND (id!=32 OR some_value IS NULL);
+-----+--+
| id |
+-----+--+
| 32 |
| 51 |
(incorrect)

Can be fixed with:
set hive.optimize.ppd.storage=false;

+-----+--+
| id |
+-----+--+
| 16 |
| 32 |
| 51 |
+-----+--+
(correct)

Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true;
and replacing or with and fixes.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)