You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2016/10/17 20:45:58 UTC

[jira] [Created] (HIVE-14989) FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte

Ruslan Dautkhanov created HIVE-14989:
----------------------------------------

             Summary: FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte
                 Key: HIVE-14989
                 URL: https://issues.apache.org/jira/browse/HIVE-14989
             Project: Hive
          Issue Type: Bug
          Components: File Formats, Parser, Reader
    Affects Versions: 0.13.1, 0.13.0
            Reporter: Ruslan Dautkhanov


FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. Delimiter starting from 2nd character becomes part of returned data. No parsed properly.

Test case:

{noformat}
CREATE external TABLE test_muldelim
(  string1 STRING,
   string2 STRING,
   string3 STRING
)
 ROW FORMAT 
       DELIMITED FIELDS TERMINATED BY '<>'
      LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
  location '/user/hive/test_muldelim'
{noformat}

Create a text file under /user/hive/test_muldelim with following 2 lines:
{noformat}
data1<>data2<>data3
aa<>bb<>cc
{noformat}

Now notice that two-character delimiter wasn't parsed properly:

{noformat}
jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ;
+------------------------+------------------------+------------------------+--+
| test_muldelim.string1  | test_muldelim.string2  | test_muldelim.string3  |
+------------------------+------------------------+------------------------+--+
| data1                  | >data2                 | >data3                 |
| aa                     | >bb                    | >cc                    |
+------------------------+------------------------+------------------------+--+
2 rows selected (0.453 seconds)
{noformat}

The second delimiter's character ('>') became part of the columns to the right (`string2` and `string3`).

Table DDL:
{noformat}
0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ;
+-----------------------------------------------------------------+--+
|                         createtab_stmt                          |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `default.test_muldelim`(              |
|   `string1` string,                                             |
|   `string2` string,                                             |
|   `string3` string)                                             |
| ROW FORMAT DELIMITED                                            |
|   FIELDS TERMINATED BY '<>'                                     |
|   LINES TERMINATED BY '\n'                                      |
| STORED AS INPUTFORMAT                                           |
|   'org.apache.hadoop.mapred.TextInputFormat'                    |
| OUTPUTFORMAT                                                    |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION                                                        |
|   'hdfs://epsdatalake/user/hive/test_muldelim'              |
| TBLPROPERTIES (                                                 |
|   'transient_lastDdlTime'='1476727100')                         |
+-----------------------------------------------------------------+--+
15 rows selected (0.286 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)