You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Vishwakarma, Chhaya" <Ch...@Teradata.com> on 2015/05/12 10:20:15 UTC

Unicode data handling with flume

Hi all,
I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also.
Once the file is there in HDFS I tried to view the content, but unable to see the records properly.
File content
Name    age  sal    msg

Abc     21  1200    Lukè éxample àpple

Xyz     23  1400    er stîget ûf mit grôzer
Output in console
I did hdfs dfs -get /flume/events/csv/events.1234567
Below is the output
Name,age,sal,msg

Abc,21,1200,Luk��xample��pple

Xyz,23,1400,er st�get �f mit gr�zer
Does flume supports Unicode characters? If not how it can be handled?
Flume version is 1.4.0


RE: Unicode data handling with flume

Posted by "Vishwakarma, Chhaya" <Ch...@Teradata.com>.
This option is not available with exec source

From: Marina [mailto:ppine7@yahoo.com]
Sent: Tuesday, May 12, 2015 5:56 PM
To: user@flume.apache.org
Subject: Re: Unicode data handling with flume

Did you try specifying encoding for your source, for example like this:
a1.sources.r1.inputCharset = ISO8859-1 << whatever charset you need>>
?

Marina

________________________________
From: "Vishwakarma, Chhaya" <Ch...@Teradata.com>>
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Sent: Tuesday, May 12, 2015 4:20 AM
Subject: Unicode data handling with flume

Hi all,
I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also.
Once the file is there in HDFS I tried to view the content, but unable to see the records properly.
File content
Name    age  sal    msg

Abc     21  1200    Lukè éxample àpple

Xyz     23  1400    er stîget ûf mit grôzer
Output in console
I did hdfs dfs -get /flume/events/csv/events.1234567
Below is the output
Name,age,sal,msg

Abc,21,1200,Luk��xample��pple

Xyz,23,1400,er st�get �f mit gr�zer
Does flume supports Unicode characters? If not how it can be handled?
Flume version is 1.4.0



Re: Unicode data handling with flume

Posted by Marina <pp...@yahoo.com>.
Did you try specifying encoding for your source, for example like this:a1.sources.r1.inputCharset = ISO8859-1 << whatever charset you need>>?
Marina

      From: "Vishwakarma, Chhaya" <Ch...@Teradata.com>
 To: "user@flume.apache.org" <us...@flume.apache.org> 
 Sent: Tuesday, May 12, 2015 4:20 AM
 Subject: Unicode data handling with flume
   
 <!--#yiv0710838864 _filtered #yiv0710838864 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv0710838864 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv0710838864 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv0710838864 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv0710838864 {font-family:Consolas;panose-1:2 11 6 9 2 2 4 3 2 4;}#yiv0710838864 #yiv0710838864 p.yiv0710838864MsoNormal, #yiv0710838864 li.yiv0710838864MsoNormal, #yiv0710838864 div.yiv0710838864MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", "sans-serif";}#yiv0710838864 a:link, #yiv0710838864 span.yiv0710838864MsoHyperlink {color:blue;text-decoration:underline;}#yiv0710838864 a:visited, #yiv0710838864 span.yiv0710838864MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv0710838864 p {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", "serif";}#yiv0710838864 code {font-family:"Courier New";}#yiv0710838864 pre {margin:0in;margin-bottom:.0001pt;font-size:10.0pt;font-family:"Courier New";}#yiv0710838864 span.yiv0710838864EmailStyle17 {font-family:"Calibri", "sans-serif";color:windowtext;}#yiv0710838864 span.yiv0710838864HTMLPreformattedChar {font-family:"Courier New";}#yiv0710838864 span.yiv0710838864apple-converted-space {}#yiv0710838864 .yiv0710838864MsoChpDefault {font-family:"Calibri", "sans-serif";} _filtered #yiv0710838864 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv0710838864 div.yiv0710838864WordSection1 {}-->Hi all, I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also. Once the file is there in HDFS I tried to view the content, but unable to see the records properly. File content Name    age  sal    msg    Abc     21  1200    Lukè éxample àpple    Xyz     23  1400    er stîget ûf mit grôzer Output in console I did hdfs dfs -get /flume/events/csv/events.1234567 Below is the output Name,age,sal,msg    Abc,21,1200,Luk��xample��pple    Xyz,23,1400,er st�get�f mit gr�zer Does flume supports Unicode characters? If not how it can be handled? Flume version is 1.4.0