You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sheeba George <sh...@gmail.com> on 2010/12/17 00:02:17 UTC
Question on UTF-8
This must be a simple question . But somehow I am not able to get it to
work.
I have a text file which has ISO Latin characters like "CancĂșn".
The mapper is taking "Text" as the input value.
public
void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
But the Latin characters are not recognized correctly and it throws a
MalInputException when I try
Text.validateUTF8(value.getBytes());
Any idea how to resolve this.
Appreciate any help.
Thanks
Sheeba
Re: Question on UTF-8
Posted by Mikhail Yakshin <gr...@gmail.com>.
On Fri, Dec 17, 2010 at 2:02 AM, Sheeba George wrote:
> This must be a simple question . But somehow I am not able to get it to
> work.
> I have a text file which has ISO Latin characters like "CancĂșn".
> The mapper is taking "Text" as the input value.
>
> public
> void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
> output, Reporter reporter) throws IOException
>
> But the Latin characters are not recognized correctly and it throws a
> MalInputException when I try
> Text.validateUTF8(value.getBytes());
Is recoding your text file as UTF-8 an option?
--
WBR, Mikhail Yakshin