You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by NOMURA Yoshihide <y....@jp.fujitsu.com> on 2008/06/02 08:19:52 UTC
Text file character encoding
Hello,
I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.
And I need to read such files in different character encoding from UTF-8,
but I think TextInputFormat doesn't support such character encoding.
I guess LineRecordReader class or Text class should support encoding
settings like this.
conf.set("io.file.defaultEncoding", "MS932");
Is there any plan to supoort different character encoding in
TextInputFormat?
Regards,
--
NOMURA Yoshihide:
Software Innovation Laboratory, Fujitsu Labs. Ltd., Japan
Tel: 044-754-2675 (Ext: 7112-6358)
Fax: 044-754-2570 (Ext: 7112-3834)
E-Mail: [y.nomura@jp.fujitsu.com]
Re: Text file character encoding
Posted by Ted Dunning <te...@gmail.com>.
You should file a Jira, make the change and submit a patch!
On Sun, Jun 1, 2008 at 11:19 PM, NOMURA Yoshihide <y....@jp.fujitsu.com>
wrote:
> Hello,
> I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.
>
> And I need to read such files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
>
> I guess LineRecordReader class or Text class should support encoding
> settings like this.
> conf.set("io.file.defaultEncoding", "MS932");
>
> Is there any plan to supoort different character encoding in
> TextInputFormat?
>
> Regards,
> --
> NOMURA Yoshihide:
> Software Innovation Laboratory, Fujitsu Labs. Ltd., Japan
> Tel: 044-754-2675 (Ext: 7112-6358)
> Fax: 044-754-2570 (Ext: 7112-3834)
> E-Mail: [y.nomura@jp.fujitsu.com]
>
>
--
ted