You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2018/03/14 09:05:00 UTC
[jira] [Commented] (HIVE-18924) Wrong column selected on insert overwrite of partitioned table

    [ https://issues.apache.org/jira/browse/HIVE-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398294#comment-16398294 ] 

Zoltan Haindrich commented on HIVE-18924:
-----------------------------------------

this bug doesn't seems to be present on the current master

{code}
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

drop table if exists t_stage;
drop table if exists t_prod;

CREATE TABLE IF NOT EXISTS t_stage (
        start_date timestamp,
        end_date timestamp
)
PARTITIONED BY (
        dt string
);

create table if not exists t_prod (
        start_date timestamp,
        end_date timestamp
)
PARTITIONED BY (
        dt string,
        hr string
)
STORED as ORC;

insert into t_stage partition(dt='2016-03-10') values
        ('2016-03-01 06:08:44','2017-02-01 07:08:44');

insert overwrite table t_prod
PARTITION(dt, hr)
select
      	UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start_date,
        UNIX_TIMESTAMP(stg.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end_date,
        date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
        date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
from t_stage stg
        where stg.dt = '2016-03-10';

select
      	UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start_date,
        UNIX_TIMESTAMP(stg.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end_date,
        date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
        date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
from t_stage stg
        where stg.dt = '2016-03-10';

select * from t_prod;

{code}



> Wrong column selected on insert overwrite of partitioned table 
> ---------------------------------------------------------------
>
>                 Key: HIVE-18924
>                 URL: https://issues.apache.org/jira/browse/HIVE-18924
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Sudip Hazra Choudhury
>            Priority: Major
>         Attachments: dataset.csv, queries_submitted.hql
>
>
>  
> Issue ::
> I have table A with dt as the partition column, start_date and end_date of type timestamp as other columns.
> eq,
> {code:java}
> Table A { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string ) }{code}
>  
> I also have another table B which has dt & hr as partition column and start_date and end_date of type timestamp as other columns.
> eq,
> {code:java}
>  Table B { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string, hr string ) }{code}
>  
> Now I do insert overwrite to table B, selecting from table A. While selecting from table A, I am using data format function on start_date column and naming it as dt.  
> eq,
> {code:java}
> insert overwrite table B
> PARTITION(dt, hr)
> select
>  UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start,
>  UNIX_TIMESTAMP(a.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end,
>  date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
>  date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
> from A a
>  where a.dt = '2018-03-10';{code}
>  
>  
> We expect the value of date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") 
> to go as the value of the partition column dt in Table B.
> Instead the partition column dt of table B takes the value of partition column dt of table A, which 2018-03-10 in this example.
>  
> But in case of only select query from table A *without the insert overwrite* to table B, we are getting the value as expected.
>  
>  
> *Steps to reproduce ::*
>  
> {code:java}
> hadoop fs -mkdir -p /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
> hadoop fs -mkdir -p /user/sudip.hc/hive-bug/prod/
> hadoop fs -put dataset.csv /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
> hive -f queries.hql{code}
>  
>  
> Data ::
>  
> {code:java}
> 2016-03-01T06:08:44Z,2017-02-01T07:08:44Z
> 2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
> 2016-03-01T06:10:34Z,2017-02-01T07:10:34Z
> 2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
> 2016-03-01T06:04:45Z,2017-02-01T07:04:45Z{code}
>  
>  
> Now Execute to check the differences::
> 1. hive -e "set hive.cli.print.header=true; select * from bug_stage;"
> {code:java}
> bug_stage.start_date bug_stage.end_date bug_stage.dt
> 2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2018-03-10
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
> 2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2018-03-10
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
> 2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2018-03-10{code}
> 2. hive -e "set hive.cli.print.header=true; select start_date, end_date, date_format(cast(UNIX_TIMESTAMP(start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt from bug_stage;"
> {code:java}
> start_date end_date dt
> 2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2016-03-01
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
> 2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2016-03-01
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
> 2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2016-03-01{code}
> 3. hive -e "set hive.cli.print.header=true; select * from bug_prod;"
> {code:java}
> bug_prod.start_date bug_prod.end_date bug_prod.dt bug_prod.hr
> 2016-03-01 06:08:44 2017-02-01 07:08:44 2018-03-10 06
> 2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
> 2016-03-01 06:10:34 2017-02-01 07:10:34 2018-03-10 06
> 2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
> 2016-03-01 06:04:45 2017-02-01 07:04:45 2018-03-10 06{code}
>  
>  
>  
> {code:java}
>  {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)