You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ted Xu (JIRA)" <ji...@apache.org> on 2010/08/19 08:19:17 UTC

[jira] Updated: (HIVE-1505) Support non-UTF8 data

     [ https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Xu updated HIVE-1505:
-------------------------

    Attachment: trunk-encoding.patch

We implemented encoding config feature on tables.
Set table encoding through serde parameter, for example:
{code}
alter table src set serdeproperties ('serialization.encoding'='GBK');
{code}
that makes table src using GBK encoding (Chinese encoding format). Further more, if using command line interface, parameter 'hive.cli.encoding' shall be set. 'hive.cli.encoding' must set before hive prompt started, so set 'hive.cli.encoding' in hive-site.xml or using -hiveconf hive.cli.encoding=GBK in command line parameter, instead of 'set hive.cli.encoding=GBK' in hive ql.
Because of the reason above, I can't find a way to add a unit test.




> Support non-UTF8 data
> ---------------------
>
>                 Key: HIVE-1505
>                 URL: https://issues.apache.org/jira/browse/HIVE-1505
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: bc Wong
>         Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the upper ascii characters in '\xef\xbf\xbd', which is the replacement character '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.