You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rui Li (JIRA)" <ji...@apache.org> on 2013/11/22 14:46:35 UTC

[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

     [ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rui Li updated HIVE-5871:
-------------------------

    Attachment: HIVE-5871.patch

This implementation mainly relies on LazySimpleSerDe for serialization and deserialization. I added some methods to LazyStruct to parse a row delimited by multiple-character string. Another difference from LazySimpleSerDe is that MultiDelimitSerDe doesn't use Base64 to encode binary fields in serialization. Because the encoded string may interfere with the delimiter. I also modified LazyBinary, so that when it deserializes a binary field and is  unable to Base64 decode the field, it just keeps the data unchanged. A simple use case is as follow:

create table test (id string,hivearray array<binary>,hivemap map<string,int>) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delimited"="[,]","collection.delimited"=":","mapkey.delimited"="@");

where field.delimited is the multiple-char field delimiter. collection.delimited is the delimiter for collection items. mapkey.delimited is the delimiter for  keys and values in maps. We currently don't support multiple-char for these two delimiters.

> Use multiple-characters as field delimiter
> ------------------------------------------
>
>                 Key: HIVE-5871
>                 URL: https://issues.apache.org/jira/browse/HIVE-5871
>             Project: Hive
>          Issue Type: Improvement
>          Components: Contrib
>    Affects Versions: 0.12.0
>            Reporter: Rui Li
>         Attachments: HIVE-5871.patch
>
>
> Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables.



--
This message was sent by Atlassian JIRA
(v6.1#6144)