You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sheeba George <sh...@gmail.com> on 2010/12/17 00:02:17 UTC

Question on UTF-8

This must be a simple question . But somehow I am not able to get it to
work.
I have a text file which has ISO Latin characters like "Cancún".
The mapper is taking "Text" as the input value.

public
void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException

But the Latin characters are not recognized correctly and it throws a
MalInputException when I try
Text.validateUTF8(value.getBytes());

Any idea how to resolve this.
 Appreciate any help.

Thanks
Sheeba

Re: Question on UTF-8

Posted by Mikhail Yakshin <gr...@gmail.com>.

On Fri, Dec 17, 2010 at 2:02 AM, Sheeba George wrote:
> This must be a simple question . But somehow I am not able to get it to
> work.
> I have a text file which has ISO Latin characters like "Cancún".
> The mapper is taking "Text" as the input value.
>
> public
> void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
> output, Reporter reporter) throws IOException
>
> But the Latin characters are not recognized correctly and it throws a
> MalInputException when I try
> Text.validateUTF8(value.getBytes());

Is recoding your text file as UTF-8 an option?

-- 
WBR, Mikhail Yakshin