You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2018/03/14 09:05:00 UTC
[jira] [Commented] (HIVE-18924) Wrong column selected on insert
overwrite of partitioned table
[ https://issues.apache.org/jira/browse/HIVE-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398294#comment-16398294 ]
Zoltan Haindrich commented on HIVE-18924:
-----------------------------------------
this bug doesn't seems to be present on the current master
{code}
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
drop table if exists t_stage;
drop table if exists t_prod;
CREATE TABLE IF NOT EXISTS t_stage (
start_date timestamp,
end_date timestamp
)
PARTITIONED BY (
dt string
);
create table if not exists t_prod (
start_date timestamp,
end_date timestamp
)
PARTITIONED BY (
dt string,
hr string
)
STORED as ORC;
insert into t_stage partition(dt='2016-03-10') values
('2016-03-01 06:08:44','2017-02-01 07:08:44');
insert overwrite table t_prod
PARTITION(dt, hr)
select
UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start_date,
UNIX_TIMESTAMP(stg.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end_date,
date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
from t_stage stg
where stg.dt = '2016-03-10';
select
UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start_date,
UNIX_TIMESTAMP(stg.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end_date,
date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
date_format(cast(UNIX_TIMESTAMP(stg.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
from t_stage stg
where stg.dt = '2016-03-10';
select * from t_prod;
{code}
> Wrong column selected on insert overwrite of partitioned table
> ---------------------------------------------------------------
>
> Key: HIVE-18924
> URL: https://issues.apache.org/jira/browse/HIVE-18924
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.1
> Reporter: Sudip Hazra Choudhury
> Priority: Major
> Attachments: dataset.csv, queries_submitted.hql
>
>
>
> Issue ::
> I have table A with dt as the partition column, start_date and end_date of type timestamp as other columns.
> eq,
> {code:java}
> Table A { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string ) }{code}
>
> I also have another table B which has dt & hr as partition column and start_date and end_date of type timestamp as other columns.
> eq,
> {code:java}
> Table B { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string, hr string ) }{code}
>
> Now I do insert overwrite to table B, selecting from table A. While selecting from table A, I am using data format function on start_date column and naming it as dt.
> eq,
> {code:java}
> insert overwrite table B
> PARTITION(dt, hr)
> select
> UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as start,
> UNIX_TIMESTAMP(a.end_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as end,
> date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt,
> date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"HH") as hr
> from A a
> where a.dt = '2018-03-10';{code}
>
>
> We expect the value of date_format(cast(UNIX_TIMESTAMP(a.start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd")
> to go as the value of the partition column dt in Table B.
> Instead the partition column dt of table B takes the value of partition column dt of table A, which 2018-03-10 in this example.
>
> But in case of only select query from table A *without the insert overwrite* to table B, we are getting the value as expected.
>
>
> *Steps to reproduce ::*
>
> {code:java}
> hadoop fs -mkdir -p /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
> hadoop fs -mkdir -p /user/sudip.hc/hive-bug/prod/
> hadoop fs -put dataset.csv /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
> hive -f queries.hql{code}
>
>
> Data ::
>
> {code:java}
> 2016-03-01T06:08:44Z,2017-02-01T07:08:44Z
> 2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
> 2016-03-01T06:10:34Z,2017-02-01T07:10:34Z
> 2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
> 2016-03-01T06:04:45Z,2017-02-01T07:04:45Z{code}
>
>
> Now Execute to check the differences::
> 1. hive -e "set hive.cli.print.header=true; select * from bug_stage;"
> {code:java}
> bug_stage.start_date bug_stage.end_date bug_stage.dt
> 2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2018-03-10
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
> 2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2018-03-10
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
> 2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2018-03-10{code}
> 2. hive -e "set hive.cli.print.header=true; select start_date, end_date, date_format(cast(UNIX_TIMESTAMP(start_date, "yyyy-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"yyyy-MM-dd") as dt from bug_stage;"
> {code:java}
> start_date end_date dt
> 2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2016-03-01
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
> 2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2016-03-01
> 2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
> 2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2016-03-01{code}
> 3. hive -e "set hive.cli.print.header=true; select * from bug_prod;"
> {code:java}
> bug_prod.start_date bug_prod.end_date bug_prod.dt bug_prod.hr
> 2016-03-01 06:08:44 2017-02-01 07:08:44 2018-03-10 06
> 2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
> 2016-03-01 06:10:34 2017-02-01 07:10:34 2018-03-10 06
> 2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
> 2016-03-01 06:04:45 2017-02-01 07:04:45 2018-03-10 06{code}
>
>
>
> {code:java}
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)