You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sudip Hazra Choudhury (JIRA)" <ji...@apache.org> on 2018/03/09 11:23:00 UTC

[jira] [Created] (HIVE-18924) Wrong column selected on insert overwrite of partitioned table

Sudip Hazra Choudhury created HIVE-18924:
--------------------------------------------

             Summary: Wrong column selected on insert overwrite of partitioned table 
                 Key: HIVE-18924
                 URL: https://issues.apache.org/jira/browse/HIVE-18924
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.1
            Reporter: Sudip Hazra Choudhury
         Attachments: dataset.csv, queries_submitted.hql

 

Issue ::

I have table A with dt as the partition column, start_date and end_date of type timestamp as other columns.

eq,
{code:java}
Table A { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string ) }{code}
 

I also have another table B which has dt & hr as partition column and start_date and end_date of type timestamp as other columns.

eq,
{code:java}
 Table B { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string, hr string ) }{code}
 

Now I do insert overwrite to table B, selecting from table A. While selecting from table A, I am using data format function on start_date column and naming it as dt.  

eq,
{code:java}
insert overwrite table B
PARTITION(dt, hr)
select
 UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start,
 UNIX_TIMESTAMP(a.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end,
 date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
 date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
from A a
 where a.dt = '2018-03-10';{code}
 

 

We expect the value of date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") 

to go as the value of the partition column dt in Table B.

Instead the partition column dt of table B takes the value of partition column dt of table A, which 2018-03-10 in this example.

 

But in case of only select query from table A *without the insert overwrite* to table B, we are getting the value as expected.

 

 

*Steps to reproduce ::*

 
{code:java}
hadoop fs -mkdir -p /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
hadoop fs -mkdir -p /user/sudip.hc/hive-bug/prod/
hadoop fs -put dataset.csv /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
hive -f queries.hql{code}
 

 

Data ::

 
{code:java}
2016-03-01T06:08:44Z,2017-02-01T07:08:44Z
2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
2016-03-01T06:10:34Z,2017-02-01T07:10:34Z
2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
2016-03-01T06:04:45Z,2017-02-01T07:04:45Z{code}
 

 

Now Execute to check the differences::

1. hive -e "set hive.cli.print.header=true; select * from bug_stage;"
{code:java}
bug_stage.start_date bug_stage.end_date bug_stage.dt
2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2018-03-10
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2018-03-10
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2018-03-10{code}
2. hive -e "set hive.cli.print.header=true; select start_date, end_date, date_format(cast(UNIX_TIMESTAMP(start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt from bug_stage;"
{code:java}
start_date end_date dt
2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2016-03-01
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2016-03-01
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2016-03-01{code}
3. hive -e "set hive.cli.print.header=true; select * from bug_prod;"
{code:java}
bug_prod.start_date bug_prod.end_date bug_prod.dt bug_prod.hr
2016-03-01 06:08:44 2017-02-01 07:08:44 2018-03-10 06
2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
2016-03-01 06:10:34 2017-02-01 07:10:34 2018-03-10 06
2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
2016-03-01 06:04:45 2017-02-01 07:04:45 2018-03-10 06{code}
 

 

 
{code:java}
 {code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)