You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Paul Chavez <pc...@ntent.com> on 2014/12/09 20:25:21 UTC

UTF-8 data mangled in flight

Hello,

Hoping to get some insight on where to further troubleshoot this issue. The scenario is we have a web application which accepts URL encoded UTF-8 characters (Cyrillic text in this instance) and then our web application sends this data to a Flume agent via HTTPSource with the JSONHandler. This agent then in turn sends the event along via Avro sink to another Flume agent which writes it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after investigating have found the following:


-          The initial POST is correct, verified via a network trace and looking at binary data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified via network trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our application as documentation states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Thanks,
Paul Chavez


RE: UTF-8 data mangled in flight

Posted by Paul Chavez <pc...@ntent.com>.
Thank you, Jeff. I tried adding that property to the Java command line to start Flume but unfortunately it didn't change the observed behavior.

Thanks,
Paul


From: j.guilmard@accenture.com [mailto:j.guilmard@accenture.com]
Sent: Tuesday, December 09, 2014 1:03 PM
To: user@flume.apache.org
Subject: RE: UTF-8 data mangled in flight

Hi Paul,

I haven't used special characters in Flume, but I had previous issues in Java with Characters encoding, and they were solved by specifying the JVM default Character encoding, with:
"-Dfile.encoding=UTF-8" (here for UTF-8)

Might be worth trying to add that in the Flume command line options? Or maybe on the front application ?

Regards

Jeff
From: Paul Chavez [mailto:pchavez@ntent.com]
Sent: mardi 9 décembre 2014 20:25
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: UTF-8 data mangled in flight

Hello,

Hoping to get some insight on where to further troubleshoot this issue. The scenario is we have a web application which accepts URL encoded UTF-8 characters (Cyrillic text in this instance) and then our web application sends this data to a Flume agent via HTTPSource with the JSONHandler. This agent then in turn sends the event along via Avro sink to another Flume agent which writes it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after investigating have found the following:


-          The initial POST is correct, verified via a network trace and looking at binary data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified via network trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our application as documentation states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Thanks,
Paul Chavez


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

RE: UTF-8 data mangled in flight

Posted by j....@accenture.com.
Hi Paul,

I haven't used special characters in Flume, but I had previous issues in Java with Characters encoding, and they were solved by specifying the JVM default Character encoding, with:
"-Dfile.encoding=UTF-8" (here for UTF-8)

Might be worth trying to add that in the Flume command line options? Or maybe on the front application ?

Regards

Jeff
From: Paul Chavez [mailto:pchavez@ntent.com]
Sent: mardi 9 décembre 2014 20:25
To: user@flume.apache.org
Subject: UTF-8 data mangled in flight

Hello,

Hoping to get some insight on where to further troubleshoot this issue. The scenario is we have a web application which accepts URL encoded UTF-8 characters (Cyrillic text in this instance) and then our web application sends this data to a Flume agent via HTTPSource with the JSONHandler. This agent then in turn sends the event along via Avro sink to another Flume agent which writes it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after investigating have found the following:


-          The initial POST is correct, verified via a network trace and looking at binary data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified via network trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our application as documentation states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Thanks,
Paul Chavez


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com