You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by tom kersnick <hi...@gmail.com> on 2009/09/25 23:58:27 UTC

Language settings within Hive and HDFS

I have some files with mixed characters from all over the world.  utf-8,
latin1, latin9, and like 10 others.  These are international files of raw IM
logs.  Is there a way to load these files as is into Hadoop? Its smart
enough to interpret the file as is correct?  My file sizes are petabytes and
I want to write some Hive queries to find patterns.  Please bare with me as
I am a newbie.

I know I can set the character level at the server level, but I want to make
sure there is no other setting that I am missing.  For example in mysql, I
can set the language at the DB Level.....

Thanks so much!

Re: Language settings within Hive and HDFS

Posted by Zheng Shao <zs...@gmail.com>.
Hi Tom,

Currently Hive/Hadoop recognizes data as UTF-8.

If your encoding is different, most likely you can still process the data
using Hive without any problems, as long as Hive/Hadoop does not have to do
UTF-8 decoding.

What is the row format of your data? Fields separated by TAB or something?
As long as the encoding does not use the separator for something else (when
as the second or third byte of a character), it should be fine.

Zheng

On Fri, Sep 25, 2009 at 2:58 PM, tom kersnick <hi...@gmail.com> wrote:

> I have some files with mixed characters from all over the world.  utf-8,
> latin1, latin9, and like 10 others.  These are international files of raw IM
> logs.  Is there a way to load these files as is into Hadoop? Its smart
> enough to interpret the file as is correct?  My file sizes are petabytes and
> I want to write some Hive queries to find patterns.  Please bare with me as
> I am a newbie.
>
> I know I can set the character level at the server level, but I want to
> make sure there is no other setting that I am missing.  For example in
> mysql, I can set the language at the DB Level.....
>
> Thanks so much!
>
>


-- 
Yours,
Zheng