You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Mike Kyle <m_...@yahoo.co.uk> on 2008/10/23 11:36:47 UTC
UTF8 problem?
I set an EntityEnclosingMethod request entity to be a ByteArrayRequestEntity. This entity has the Java characters "\u4E2D\u6587" as the corresponding UTF8 bytes (UTF8 = 0xE4,0xB8,0xAD,0xE6,0x96,0x87). This is confirmed by logging httpclient.wire.content. There I see the UTF8 values.
However what appears to really get transmitted is the corresponding Java characters rather than the UTF8 values! As this is supposedly a UTF8 encoded XML document the receiver is not best pleased. This is confirmed by performing HTTP sniffing using org.apache.axis.utils.tcpmon.My suspicion is that somehow a character handler is intervening?
Debugging HttpConnection implied that the output stream is a BufferedOutputStream wrapping a java.net.SocketOutputStream. I had assumed that the socket streams would be byte oriented. The content type is set to 'text/xml; chartset="utf-8"'.
I am normally using HttpClient 3.0 + but the latest 3.1 appeared to react exactly the same.
Re: UTF8 problem?
Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2008-10-23 at 09:36 +0000, Mike Kyle wrote:
> I set an EntityEnclosingMethod request entity to be a ByteArrayRequestEntity. This entity has the Java characters "\u4E2D\u6587" as the corresponding UTF8 bytes (UTF8 = 0xE4,0xB8,0xAD,0xE6,0x96,0x87). This is confirmed by logging httpclient.wire.content. There I see the UTF8 values.
Mike,
What is logged in the wire log is exactly what gets written to the
underlying socket. I do not think HttpClient is culprit.
Oleg
>
> However what appears to really get transmitted is the corresponding Java characters rather than the UTF8 values! As this is supposedly a UTF8 encoded XML document the receiver is not best pleased. This is confirmed by performing HTTP sniffing using org.apache.axis.utils.tcpmon.My suspicion is that somehow a character handler is intervening?
>
> Debugging HttpConnection implied that the output stream is a BufferedOutputStream wrapping a java.net.SocketOutputStream. I had assumed that the socket streams would be byte oriented. The content type is set to 'text/xml; chartset="utf-8"'.
>
> I am normally using HttpClient 3.0 + but the latest 3.1 appeared to react exactly the same.
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org
Re: UTF8 problem?
Posted by Hanson Char <ha...@gmail.com>.
Have you tried changing the default JVM encoding via:
-Dfile.encoding=UTF-8
to see if that works ?
Cheers,
Hanson
On Thu, Oct 23, 2008 at 2:36 AM, Mike Kyle <m_...@yahoo.co.uk> wrote:
> I set an EntityEnclosingMethod request entity to be a ByteArrayRequestEntity. This entity has the Java characters "\u4E2D\u6587" as the corresponding UTF8 bytes (UTF8 = 0xE4,0xB8,0xAD,0xE6,0x96,0x87). This is confirmed by logging httpclient.wire.content. There I see the UTF8 values.
>
> However what appears to really get transmitted is the corresponding Java characters rather than the UTF8 values! As this is supposedly a UTF8 encoded XML document the receiver is not best pleased. This is confirmed by performing HTTP sniffing using org.apache.axis.utils.tcpmon.My suspicion is that somehow a character handler is intervening?
>
> Debugging HttpConnection implied that the output stream is a BufferedOutputStream wrapping a java.net.SocketOutputStream. I had assumed that the socket streams would be byte oriented. The content type is set to 'text/xml; chartset="utf-8"'.
>
> I am normally using HttpClient 3.0 + but the latest 3.1 appeared to react exactly the same.
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org