You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Junichi Oda (JIRA)" <ji...@apache.org> on 2018/01/29 09:18:00 UTC

[jira] [Created] (HIVE-18563) "Load data into table" behavior is different between 1.2.1 and 1.2.1000

Junichi Oda created HIVE-18563:
----------------------------------

             Summary: "Load data into table" behavior is different between 1.2.1 and 1.2.1000
                 Key: HIVE-18563
                 URL: https://issues.apache.org/jira/browse/HIVE-18563
             Project: Hive
          Issue Type: Bug
          Components: Hive, HiveServer2
         Environment: * OS : CentOS6
 * JDK : 1.8.0_152(Oracle)
 * HDP : 2.3.2.0 and 2.6.2.0
 * Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205
            Reporter: Junichi Oda


After upgrading HDP from 2.3.2.0 to 2.6.2.0, the "load data into table" behavior changed.

Data is input hourly, All files have the same name.

{code:java}
/user/user1/logs/yyyymmdd/00/part-r-00000.gz
/user/user1/logs/yyyymmdd/01/part-r-00000.gz
/user/user1/logs/yyyymmdd/02/part-r-00000.gz
/user/user1/logs/yyyymmdd/03/part-r-00000.gz
・・・・・・・・・・・・・・・・・・・・・・・
/user/user1/logs/yyyymmdd/22/part-r-00000.gz
/user/user1/logs/yyyymmdd/23/part-r-00000.gz
{code}

Before upgrade (HDP 2.3.2.0 )

{code:java}
HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
 
 
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz
{code}
All files were renamed into part-r-0000_copy_*.gz without the file part-r-0000.gz.

After upgrade(HDP 2.6.2.0 )
{code:java}
HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
 
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
{code}
There is only part-r-0000.gz.

This file was the same file as part-r-0000_copy_23.gz.

When files are loaded one by one, I can load all files like as HDP 2.3.2.0 environment.

Why is the behavior different between 2.3.2.0 and 2.6.2.0 ?

Thanks in advance

 

https://community.hortonworks.com/questions/158176/load-data-into-table-behavior-is-different-between.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)