You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Shawn Weeks (JIRA)" <ji...@apache.org> on 2017/10/17 12:36:00 UTC

[jira] [Commented] (HIVE-14867) "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe

    [ https://issues.apache.org/jira/browse/HIVE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207575#comment-16207575 ] 

Shawn Weeks commented on HIVE-14867:
------------------------------------

Ran across this issue troubleshooting for a customer. This essentially makes this serde useless as it's always going to throw garbage in the last column. Is there a reason we can't just add multi character field delimiters to other text serde and deprecate this one as it doesn't appear to be getting maintained.

> "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14867
>                 URL: https://issues.apache.org/jira/browse/HIVE-14867
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 1.3.0
>            Reporter: Niklaus Xiao
>            Assignee: Niklaus Xiao
>
> Create table with MultiDelimitSerde:
> {code}
> CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored as textfile;
> {code}
> load data into table:
> {code}
> 1|@|Lily|@|HW|@|abc
> 2|@|Lucy|@|LX|@|123
> 3|@|Lilei|@|XX|@|3434
> {code}
> select data from this table:
> {code}
> select * from foo;
> +---------+----------------+--+
> | foo.a  |     foo.b     |
> +---------+----------------+--+
> | 1       | Lily^AHW^Aabc    |
> | 2       | Lucy^ALX^A123    |
> | 3       | Lilei^AXX^A3434  |
> +---------+----------------+--+
> 3 rows selected (0.905 seconds)
> {code}
> You can see the last column takes all the data, and replace the delimiter to default ^A.
> lastColumnTakesRestString should be false by default: 
> {code}
>     String lastColumnTakesRestString = tbl
>         .getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
>     lastColumnTakesRest = (lastColumnTakesRestString != null && lastColumnTakesRestString
>         .equalsIgnoreCase("true"));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)