You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2017/02/03 18:22:51 UTC
[jira] [Reopened] (DRILL-5002) Using hive's date functions on top
of date column in parquet gives wrong results
[ https://issues.apache.org/jira/browse/DRILL-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Challapalli reopened DRILL-5002:
--------------------------------------
There was some mis-understanding. We are still seeing wrong results. Check the below case with the hive plugin. The month should have been 2.
{code}
select l_shipdate, `month`(l_shipdate) from hive.lineitem where l_shipdate = date '1994-02-01' limit 2;
+-------------+---------+
| l_shipdate | EXPR$1 |
+-------------+---------+
| 1994-02-01 | 1 |
| 1994-02-01 | 1 |
+-------------+---------+
2 rows selected (0.28 seconds)
{code}
The same query from hive on the same parquet file returns the correct results
{code}
select l_shipdate, `month`(l_shipdate) from lineitem where l_shipdate = date '1994-02-01' limit 2;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
1994-02-01 2
1994-02-01 2
Time taken: 0.536 seconds, Fetched: 2 row(s)
{code}
I attached the parquet file used and below is the hive ddl
{code}
create external table if not exists lineitem (
l_orderkey int,
l_partkey int,
l_suppkey int,
l_linenumber int,
l_quantity double,
l_extendedprice double,
l_discount double,
l_tax double,
l_returnflag string,
l_linestatus string,
l_shipdate date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct string,
l_shipmode string,
l_comment string
)
STORED AS PARQUET
LOCATION '/drill/testdata/lineitem';
{code}
> Using hive's date functions on top of date column in parquet gives wrong results
> --------------------------------------------------------------------------------
>
> Key: DRILL-5002
> URL: https://issues.apache.org/jira/browse/DRILL-5002
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Hive, Storage - Parquet
> Reporter: Rahul Challapalli
> Assignee: Vitalii Diravka
> Priority: Critical
>
> git.commit.id.abbrev=190d5d4
> Wrong Result 1 :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where l_shipdate = date '1994-02-01' limit 2;
> +-------------+---------+
> | l_shipdate | EXPR$1 |
> +-------------+---------+
> | 1994-02-01 | 1 |
> | 1994-02-01 | 1 |
> +-------------+---------+
> {code}
> Wrong Result 2 :
> {code}
> select l_shipdate, `day`(l_shipdate) from cp.`tpch/lineitem.parquet` where l_shipdate = date '1998-06-02' limit 2;
> +-------------+---------+
> | l_shipdate | EXPR$1 |
> +-------------+---------+
> | 1998-06-02 | 1 |
> | 1998-06-02 | 1 |
> +-------------+---------+
> {code}
> Correct Result :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where l_shipdate = date '1998-06-02' limit 2;
> +-------------+---------+
> | l_shipdate | EXPR$1 |
> +-------------+---------+
> | 1998-06-02 | 6 |
> | 1998-06-02 | 6 |
> +-------------+---------+
> {code}
> It looks like we are getting wrong results when the 'day' is '01'. I only tried month and day hive functions....but wouldn't be surprised if they have similar issues too.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)