You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2016/03/09 14:57:40 UTC

[jira] [Commented] (HIVE-12678) BETWEEN relational operator sometimes returns incorrect results against PARQUET tables

    [ https://issues.apache.org/jira/browse/HIVE-12678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187118#comment-15187118 ] 

Chaoyu Tang commented on HIVE-12678:
------------------------------------

Tested in upstream for the queries which return incorrect results:
{code}
hive> set hive.optimize.index.filter=true;
hive> set hive.optimize.ppd.storage=true;
hive> select count(*) from t where c between '2015-12-09' and '2015-12-11';
Total MapReduce CPU Time Spent: 0 msec
OK
3

hive> select count(*) from t where c between '2015-12-10' and '2015-12-10';
OK
1
{code}
After disabling the hive.compute.query.using.stats, these queries also return the correct result. Somehow the issue has been fixed.

> BETWEEN relational operator sometimes returns incorrect results against PARQUET tables
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-12678
>                 URL: https://issues.apache.org/jira/browse/HIVE-12678
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 1.2.1
>            Reporter: Nicholas Brenwald
>            Assignee: Chaoyu Tang
>
> When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled
> Create a parquet table:
> {code}
> create table t(c string) stored as parquet;
> {code}
> Insert some strings representing dates
> {code}
> insert into t select '2015-12-09' from default.dual limit 1;
> insert into t select '2015-12-10' from default.dual limit 1;
> insert into t select '2015-12-11' from default.dual limit 1;
> {code}
> h3. Example 1
> This query correctly returns 3:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11';
> +------+--+
> | _c0  |
> +------+--+
> | 3    |
> +------+--+
> {code}
> This query incorrectly returns 1:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c between '2015-12-09' and '2015-12-11';
> +------+--+
> | _c0  |
> +------+--+
> | 1    |
> +------+--+
> {code}
> Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3:
> {code}
> set hive.optimize.index.filter=false;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c between '2015-12-09' and '2015-12-11';
> +------+--+
> | _c0  |
> +------+--+
> | 3    |
> +------+--+
> {code}
> Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=false;
> select count(*) from t where c between '2015-12-09' and '2015-12-11';
> +------+--+
> | _c0  |
> +------+--+
> | 3    |
> +------+--+
> {code}
> h3. Example 2
> This query correctly returns 1:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c >=  '2015-12-10' and c <= '2015-12-10';
> +------+--+
> | _c0  |
> +------+--+
> | 1    |
> +------+--+
> {code}
> This query incorrectly returns 0:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c between '2015-12-10' and '2015-12-10';
> +------+--+
> | _c0  |
> +------+--+
> | 0    |
> +------+--+
> {code}
> Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1:
> {code}
> set hive.optimize.index.filter=false;
> set hive.optimize.ppd.storage=true;
> select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
> +------+--+
> | _c0  |
> +------+--+
> | 1    |
> +------+--+
> {code}
> Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1:
> {code}
> set hive.optimize.index.filter=true;
> set hive.optimize.ppd.storage=false;
> select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
> +------+--+
> | _c0  |
> +------+--+
> | 1    |
> +------+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)