You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Tony Hill (JIRA)" <ji...@apache.org> on 2017/05/12 14:01:04 UTC
[jira] [Created] (HIVE-16661) Parquet storage does not handle 'or'
statement properly
Tony Hill created HIVE-16661:
--------------------------------
Summary: Parquet storage does not handle 'or' statement properly
Key: HIVE-16661
URL: https://issues.apache.org/jira/browse/HIVE-16661
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.1.0
Reporter: Tony Hill
Query on a parquet backed table returns different results based on value of hive.optimize.ppd.storage.
Steps to reproduce:
CREATE TABLE `test_table`(
`some_value` int)
PARTITIONED BY (
`date` string,
`id` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
set hive.exec.dynamic.partition.mode=nonstrict;
insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, '2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, '2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32);
SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = '2017-04-09' AND (id!=32 OR some_value IS NULL);
+-----+--+
| id |
+-----+--+
| 32 |
| 51 |
(incorrect)
Can be fixed with:
set hive.optimize.ppd.storage=false;
+-----+--+
| id |
+-----+--+
| 16 |
| 32 |
| 51 |
+-----+--+
(correct)
Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true;
and replacing or with and fixes.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)