You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2016/10/17 20:45:58 UTC
[jira] [Created] (HIVE-14989) FIELDS TERMINATED BY parsing broken
when delimiter is more than 1 byte
Ruslan Dautkhanov created HIVE-14989:
----------------------------------------
Summary: FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte
Key: HIVE-14989
URL: https://issues.apache.org/jira/browse/HIVE-14989
Project: Hive
Issue Type: Bug
Components: File Formats, Parser, Reader
Affects Versions: 0.13.1, 0.13.0
Reporter: Ruslan Dautkhanov
FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. Delimiter starting from 2nd character becomes part of returned data. No parsed properly.
Test case:
{noformat}
CREATE external TABLE test_muldelim
( string1 STRING,
string2 STRING,
string3 STRING
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '<>'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
location '/user/hive/test_muldelim'
{noformat}
Create a text file under /user/hive/test_muldelim with following 2 lines:
{noformat}
data1<>data2<>data3
aa<>bb<>cc
{noformat}
Now notice that two-character delimiter wasn't parsed properly:
{noformat}
jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ;
+------------------------+------------------------+------------------------+--+
| test_muldelim.string1 | test_muldelim.string2 | test_muldelim.string3 |
+------------------------+------------------------+------------------------+--+
| data1 | >data2 | >data3 |
| aa | >bb | >cc |
+------------------------+------------------------+------------------------+--+
2 rows selected (0.453 seconds)
{noformat}
The second delimiter's character ('>') became part of the columns to the right (`string2` and `string3`).
Table DDL:
{noformat}
0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ;
+-----------------------------------------------------------------+--+
| createtab_stmt |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `default.test_muldelim`( |
| `string1` string, |
| `string2` string, |
| `string3` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '<>' |
| LINES TERMINATED BY '\n' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://epsdatalake/user/hive/test_muldelim' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1476727100') |
+-----------------------------------------------------------------+--+
15 rows selected (0.286 seconds)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)