You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ganesha Shreedhara (JIRA)" <ji...@apache.org> on 2019/03/12 04:35:00 UTC
[jira] [Created] (HIVE-21428) field delimiter of serde set for
partition is not getting respected when vectorization is enabled
Ganesha Shreedhara created HIVE-21428:
-----------------------------------------
Summary: field delimiter of serde set for partition is not getting respected when vectorization is enabled
Key: HIVE-21428
URL: https://issues.apache.org/jira/browse/HIVE-21428
Project: Hive
Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Ganesha Shreedhara
*Steps to reproduce:*
create external table src (c1 string, c2, string, c3 string) partitioned by (part string)
location '/tmp/src';
echo "d1\td2" >> data.txt;
hadoop dfs -put data.txt /tmp/src/part=part1/;
MSCK REPAIR TABLE src;
ALTER TABLE src PARTITION (part='part1')
SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 'field.delim'='\t');
create table dest (c1 string, c2 string, c3 string, c4 string);
insert overwrite table dest select * from src;
select * from dest;
*Result* (wrong)*:*
d1 d2 NULL NULL part1
set hive.vectorized.execution.enabled=false;
insert overwrite table dest select * from src;
select * from dest;
*Result* (Correct)*:*
d1 d2 NULL part1
This is because "d1\td2" is getting considered as single column because the filed delimiter used by deserialiser is *^A* instead of *\t* which is set at partition level.
It is working fine if I alter the field delimiter of serde for the entire table.
So, looks like serde properties in TableDesc is taking precedence over serde properties in PartitionDesc.
This issue is not there in 2.x versions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)