You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Shawn Weeks (JIRA)" <ji...@apache.org> on 2017/10/17 12:36:00 UTC
[jira] [Commented] (HIVE-14867)
"serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
[ https://issues.apache.org/jira/browse/HIVE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207575#comment-16207575 ]
Shawn Weeks commented on HIVE-14867:
------------------------------------
Ran across this issue troubleshooting for a customer. This essentially makes this serde useless as it's always going to throw garbage in the last column. Is there a reason we can't just add multi character field delimiters to other text serde and deprecate this one as it doesn't appear to be getting maintained.
> "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
> --------------------------------------------------------------------------
>
> Key: HIVE-14867
> URL: https://issues.apache.org/jira/browse/HIVE-14867
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 1.3.0
> Reporter: Niklaus Xiao
> Assignee: Niklaus Xiao
>
> Create table with MultiDelimitSerde:
> {code}
> CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored as textfile;
> {code}
> load data into table:
> {code}
> 1|@|Lily|@|HW|@|abc
> 2|@|Lucy|@|LX|@|123
> 3|@|Lilei|@|XX|@|3434
> {code}
> select data from this table:
> {code}
> select * from foo;
> +---------+----------------+--+
> | foo.a | foo.b |
> +---------+----------------+--+
> | 1 | Lily^AHW^Aabc |
> | 2 | Lucy^ALX^A123 |
> | 3 | Lilei^AXX^A3434 |
> +---------+----------------+--+
> 3 rows selected (0.905 seconds)
> {code}
> You can see the last column takes all the data, and replace the delimiter to default ^A.
> lastColumnTakesRestString should be false by default:
> {code}
> String lastColumnTakesRestString = tbl
> .getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
> lastColumnTakesRest = (lastColumnTakesRestString != null && lastColumnTakesRestString
> .equalsIgnoreCase("true"));
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)