You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Zhang Kai <zh...@gmail.com> on 2012/03/29 05:46:57 UTC

Make hive support various charsets

Hi all

I've been working with hive for some time.

In my company, we use hive for querying on large datasets and found it's
very easy to use.

However we also found hive is lack of various charsets support so that we
have to manually transform data files to utf-8 encoding before loading them
into hive.

So I have made a patch to make hive supports setting charset when creating
a table.
And the charset property will be used by SerDe when it serialize or
deserialize data.

The modified hql is like:

CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS
TERMINATED BY '\t';

I'm very happy to contribute this to the community and looking forward to
your feedbacks.

Thanks,
Kai Zhang

Re: Make hive support various charsets

Posted by Zhang Kai <zh...@gmail.com>.
Hi

I have created an issue HIVE-2917 and submitted patch through Phabricator.

Is there anyone who would like to review it?

Thanks,
Kai Zhang

2012/3/29 Namit Jain <nj...@fb.com>

> Kai,
>
> That would be great.
>
> Please file a jura, and submit a patch.
> We would definitely like to get it for the whole community
>
>
> Thanks,
> -namit
>
>
> On 3/28/12 8:46 PM, "Zhang Kai" <zh...@gmail.com> wrote:
>
> >Hi all
> >
> >I've been working with hive for some time.
> >
> >In my company, we use hive for querying on large datasets and found it's
> >very easy to use.
> >
> >However we also found hive is lack of various charsets support so that we
> >have to manually transform data files to utf-8 encoding before loading
> >them
> >into hive.
> >
> >So I have made a patch to make hive supports setting charset when creating
> >a table.
> >And the charset property will be used by SerDe when it serialize or
> >deserialize data.
> >
> >The modified hql is like:
> >
> >CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS
> >TERMINATED BY '\t';
> >
> >I'm very happy to contribute this to the community and looking forward to
> >your feedbacks.
> >
> >Thanks,
> >Kai Zhang
>
>

Re: Make hive support various charsets

Posted by Namit Jain <nj...@fb.com>.
Kai,

That would be great.

Please file a jura, and submit a patch.
We would definitely like to get it for the whole community


Thanks,
-namit


On 3/28/12 8:46 PM, "Zhang Kai" <zh...@gmail.com> wrote:

>Hi all
>
>I've been working with hive for some time.
>
>In my company, we use hive for querying on large datasets and found it's
>very easy to use.
>
>However we also found hive is lack of various charsets support so that we
>have to manually transform data files to utf-8 encoding before loading
>them
>into hive.
>
>So I have made a patch to make hive supports setting charset when creating
>a table.
>And the charset property will be used by SerDe when it serialize or
>deserialize data.
>
>The modified hql is like:
>
>CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS
>TERMINATED BY '\t';
>
>I'm very happy to contribute this to the community and looking forward to
>your feedbacks.
>
>Thanks,
>Kai Zhang