You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Takahiko Saito (JIRA)" <ji...@apache.org> on 2016/06/03 21:19:59 UTC

[jira] [Created] (HIVE-13943) Null is inserted into an existing partition after replacing a column of int with string of text formatted partitioned table

Takahiko Saito created HIVE-13943:
-------------------------------------

             Summary: Null is inserted into an existing partition after replacing a column of int with string of text formatted partitioned table 
                 Key: HIVE-13943
                 URL: https://issues.apache.org/jira/browse/HIVE-13943
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.1, 2.1.0
            Reporter: Takahiko Saito


Create a text formatted table with a int column partitioned by a string column.
After replacing the columns of int with string and inserting a new row with the existing partition ('horton' in this case), null is inserted as a value in the altered column:
{noformat}
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test;
No rows affected (0.249 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int) partitioned by (s string);
No rows affected (0.116 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values(1, 'horton');
INFO  : Session is already open
INFO  : Dag name: insert into table test partition...'horton')(Stage-1)
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759)

INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-46_619_6747668726633461908-940/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 122
INFO  : 	Loading partition {s=horton}
INFO  : 	 Time taken for adding to write entity : 0
INFO  : Partition default.test{s=horton} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1]
No rows affected (8.301 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
+---------+---------+--+
| test.i  | test.s  |
+---------+---------+--+
| 1       | horton  |
+---------+---------+--+
1 row selected (0.184 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (i string);
No rows affected (0.138 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values ('horton', 'horton');
INFO  : Session is already open
INFO  : Dag name: insert into table test partition...'horton')(Stage-1)
INFO  :

INFO  : Map 1: 1/1
INFO  : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-55_295_8631413609124947877-940/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 190
INFO  : 	Loading partition {s=horton}
INFO  : 	 Time taken for adding to write entity : 0
INFO  : Partition default.test{s=horton} stats: [numFiles=2, numRows=2, totalSize=9, rawDataSize=7]
No rows affected (1.35 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
+---------+---------+--+
| test.i  | test.s  |
+---------+---------+--+
| 1       | horton  |
| NULL    | horton  |
+---------+---------+--+
2 rows selected (0.08 seconds)
{noformat}

The below is explain of insertion:
{noformat}
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> explain insert into table test partition (s) values ('horton', 'horton');
+-------------------------------------------------------------------------------------------------------+--+
|                                                Explain                                                |
+-------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                   |
|   Stage-1 is a root stage                                                                             |
|   Stage-2 depends on stages: Stage-1                                                                  |
|   Stage-0 depends on stages: Stage-2                                                                  |
|   Stage-3 depends on stages: Stage-0                                                                  |
|                                                                                                       |
| STAGE PLANS:                                                                                          |
|   Stage: Stage-1                                                                                      |
|     Tez                                                                                               |
|       DagId: hive_20160603211130_4262d739-5bc1-4be0-95c3-2b666f5db7b8:1323                            |
|       Vertices:                                                                                       |
|         Map 1                                                                                         |
|             Map Operator Tree:                                                                        |
|                 TableScan                                                                             |
|                   alias: values__tmp__table__17                                                       |
|                   Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE      |
|                   Select Operator                                                                     |
|                     expressions: tmp_values_col1 (type: string), tmp_values_col2 (type: string)       |
|                     outputColumnNames: _col0, _col1                                                   |
|                     Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE    |
|                     File Output Operator                                                              |
|                       compressed: false                                                               |
|                       Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE  |
|                       table:                                                                          |
|                           input format: org.apache.hadoop.mapred.TextInputFormat                      |
|                           output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   |
|                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   |
|                           name: default.test                                                          |
|                                                                                                       |
|   Stage: Stage-2                                                                                      |
|     Dependency Collection                                                                             |
|                                                                                                       |
|   Stage: Stage-0                                                                                      |
|     Move Operator                                                                                     |
|       tables:                                                                                         |
|           partition:                                                                                  |
|             s                                                                                         |
|           replace: false                                                                              |
|           table:                                                                                      |
|               input format: org.apache.hadoop.mapred.TextInputFormat                                  |
|               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat               |
|               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                               |
|               name: default.test                                                                      |
|                                                                                                       |
|   Stage: Stage-3                                                                                      |
|     Stats-Aggr Operator                                                                               |
|                                                                                                       |
+-------------------------------------------------------------------------------------------------------+--+
{noformat]

The issue is NOT seen with orc table.

Also I tried with another table with int a string column partitioned by timestamp column, but the issue was NOT seen:
{noformat}
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test;
No rows affected (0.284 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (s string) partitioned by (ts timestamp);
No rows affected (0.093 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values ('1', '2034-08-04 17:42:59.0');
INFO  : Session is already open
INFO  : Dag name: insert into table test partit...17:42:59.0')(Stage-1)
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759)

INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-17-35_097_7856579707078577394-940/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 132
INFO  : 	Loading partition {ts=2034-08-04 17:42:59.0}
INFO  : 	 Time taken for adding to write entity : 0
INFO  : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1]
No rows affected (4.917 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
+---------+------------------------+--+
| test.s  |        test.ts         |
+---------+------------------------+--+
| 1       | 2034-08-04 17:42:59.0  |
+---------+------------------------+--+
1 row selected (0.104 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (s int);
No rows affected (0.104 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values ('2', '2034-08-04 17:42:59.0');
INFO  : Session is already open
INFO  : Dag name: insert into table test partit...17:42:59.0')(Stage-1)
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759)

INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-17-40_287_4196012162926457792-940/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 252
INFO  : 	Loading partition {ts=2034-08-04 17:42:59.0}
INFO  : 	 Time taken for adding to write entity : 0
INFO  : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=2, numRows=2, totalSize=4, rawDataSize=2]
No rows affected (1.573 seconds)
0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
+---------+------------------------+--+
| test.s  |        test.ts         |
+---------+------------------------+--+
| 1       | 2034-08-04 17:42:59.0  |
| 2       | 2034-08-04 17:42:59.0  |
+---------+------------------------+--+
2 rows selected (0.116 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)