You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Khaleel Khalid <kh...@suntecgroup.com> on 2014/01/24 05:24:15 UTC
Localization feature
Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.
When we process files containing a character like 'Ç', the character gets replaced by '�' in the output.
Is there any possible work around for this?
Thanks in advance,
Khaleel
RE: Localization feature
Posted by java8964 <ja...@hotmail.com>.
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding is your source data?
Yong
Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khaleelk@suntecgroup.com
To: user@hadoop.apache.org
Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.
When we process files containing a character like 'Ç', the character gets replaced by '�' in the output.
Is there any possible work around for this?
Thanks in advance,
Khaleel
RE: Localization feature
Posted by java8964 <ja...@hotmail.com>.
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding is your source data?
Yong
Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khaleelk@suntecgroup.com
To: user@hadoop.apache.org
Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.
When we process files containing a character like 'Ç', the character gets replaced by '�' in the output.
Is there any possible work around for this?
Thanks in advance,
Khaleel
RE: Localization feature
Posted by java8964 <ja...@hotmail.com>.
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding is your source data?
Yong
Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khaleelk@suntecgroup.com
To: user@hadoop.apache.org
Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.
When we process files containing a character like 'Ç', the character gets replaced by '�' in the output.
Is there any possible work around for this?
Thanks in advance,
Khaleel
RE: Localization feature
Posted by java8964 <ja...@hotmail.com>.
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding is your source data?
Yong
Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khaleelk@suntecgroup.com
To: user@hadoop.apache.org
Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.
When we process files containing a character like 'Ç', the character gets replaced by '�' in the output.
Is there any possible work around for this?
Thanks in advance,
Khaleel