You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ganesha Shreedhara (JIRA)" <ji...@apache.org> on 2019/03/12 04:35:00 UTC

[jira] [Created] (HIVE-21428) field delimiter of serde set for partition is not getting respected when vectorization is enabled

Ganesha Shreedhara created HIVE-21428:
-----------------------------------------

             Summary: field delimiter of serde set for partition is not getting respected when vectorization is enabled
                 Key: HIVE-21428
                 URL: https://issues.apache.org/jira/browse/HIVE-21428
             Project: Hive
          Issue Type: Bug
    Affects Versions: 3.1.1
            Reporter: Ganesha Shreedhara


 

*Steps to reproduce:*

create external table src (c1 string, c2, string, c3 string) partitioned by (part string)

location '/tmp/src';

 

 

echo "d1\td2"  >> data.txt;

hadoop dfs -put  data.txt /tmp/src/part=part1/;

 

MSCK REPAIR TABLE src;

 

ALTER TABLE src PARTITION (part='part1')

SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 'field.delim'='\t');

 

create table dest (c1 string, c2 string, c3 string, c4 string);

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (wrong)*:*

d1 d2 NULL NULL part1

 

set hive.vectorized.execution.enabled=false;

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (Correct)*:*

d1 d2 NULL part1

 

This is because "d1\td2" is getting considered as single column because the filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at partition level.

It is working fine if I alter the field delimiter of serde for the entire table.

So, looks like serde properties in TableDesc is taking precedence over serde properties in PartitionDesc. 

This issue is not there in 2.x versions. 

 

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)