You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cxf.apache.org by Jose María Zaragoza <de...@gmail.com> on 2013/04/25 00:32:14 UTC

Re: Problem with web service client encoding

Hello:

I' looking this example and I'd like to understand some things:

1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
content charset ?
How does Apache CXF choose what is the HTTP header charset to return to a
client ?


2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
like this example ), What is the priority to decode the message ?

I guess that encoding document is first one , but I'm not sure


Thanks



2013/3/13 Daniel Kulp <dk...@apache.org>

>
> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <el...@gmail.com> wrote:
> > I´ve a problem with client encoding, when I read some element with
> special characters in response I get bad characters like ��
> >
> > The log in is:
> >
> > INFO: Inbound Message
> > ----------------------------
> > ID: 1
> > Response-Code: 200
> > Encoding: ISO-8859-1
> > Content-Type: text/xml
> > Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
> > Messages:
> > Message (saved to tmp file):
> > Filename:
> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
> > (message truncated to -1 bytes)
> >
> > Payload: <?xml version="1.0" encoding="UTF-8"?>
> > <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> ......
> >
> > I think the problem is that there are two different encodings "Encoding:
> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
> > Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
> version="1.0" encoding="ISO-8859-1"?>?
> >
> > Thanks and best regards.
>
> Yea… that seems very wrong to me.  Seems like a bit of an invalid message
> as I'd expect the Content-Type to set a charset of utf-8.   I would attempt
> two things:
>
> 1) Stick an interceptor on the incoming chain that would set:
> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
> UTF-8.
>
> 2) You can try chaining the <?xml> header via an input stream filter or
> similar.
>
> 3) Remove the InputSteam from the message contents, wrapper it with an
> InputStreamReader using whichever encoding works, and set that into the
> message content as a Reader.class.   CXF will then delegate to that to
> handle the charset stuff.
>
>
>
>
> --
> Daniel Kulp
> dkulp@apache.org - http://dankulp.com/blog
> Talend Community Coder - http://coders.talend.com
>
>

Re: Problem with web service client encoding

Posted by Daniel Kulp <dk...@apache.org>.
On Apr 25, 2013, at 2:24 PM, Jose María Zaragoza <de...@gmail.com> wrote:

> Thanks Daniel.
> 
> But there are some things that I dont understand
> 
> This is a log for a sending from a client by using CXF 2.7.3
> 
> Address: http://x.x.x.x:8080/services/WSHttpSoap11Endpoint/
> Encoding: UTF-8
> Http-Method: POST
> Content-Type: text/xml
> Headers: {Accept=[*/*], SOAPAction=["urn:process"]}
> Payload: <soap:Envelope xmlns:soap="
> http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns3:process xmlns="
> http://bean.util.distribuidores.movistar/xsd" xmlns:ns2="http://bean/xsd"
> xmlns:ns3="http://ws
> "><ns3:in><ns2:data>YYYY</ns2:data></ns3:in></ns3:process></soap:Body></soap:Envelope>
> 
> 
> As you can see,  Content-Type hasn't got a charset and Encoding is UTF-8.
> It should be ISO- 8859-1 ,  shouldn't it  ?

Well, no.  The CXF HTTP Conduit combines the internal "Content-Type" and the "Encoding" attributes from the message into the format that is needed for HTTP.  (JMS would do something different, etc…)   The "Encoding" there is what matters.   This is why I suggest grabbing wireshark and seeing what is on the raw wire transport.  


> So, I'm not sure that Encoding is the HTTP content encoding , right ?

On the client/conduit side, the HTTP transport would use that to setup the appropriate headers.   

> Futhermore, XML payload (SOAP message ) hasn't got a  <?xml ... ?> header
> with encoding
> I don't know if it makes any sense to have a XML encoding , because XML is
> build by CXF runtime and it chooses the encoding that it prefers

Yea.  It's pretty much redundant and not needed with soap as we know it's XML and we also know the charset from the HTTP headers.  Thus, we don't bother outputting it as it's just redundant information that wastes bandwidth. (admittedly not much, but some).

Dan


> 
> 
> Regards
> 
> 
> 2013/4/25 Daniel Kulp <dk...@apache.org>
> 
>> 
>> On Apr 24, 2013, at 6:32 PM, Jose María Zaragoza <de...@gmail.com>
>> wrote:
>> 
>>> Hello:
>>> 
>>> I' looking this example and I'd like to understand some things:
>>> 
>>> 1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
>>> content charset ?
>> 
>> Yea.  If you do a wireshark or similar to get the raw TCP bytes, this
>> should be the charset in the Content-Type header.   If there is not a
>> charset on the Content-Type header, the default (per HTTP spec) is
>> ISO-8859-1 which may be where this value is coming from.
>> 
>>> How does Apache CXF choose what is the HTTP header charset to return to a
>>> client ?
>> 
>> I think it will always use UTF-8 unless the user goes out of the way (and
>> it's not easy) to change it.   I'd need to dig through some code to verify
>> though.   At one point a long time ago, we did try to use the same charset
>> that the client sent the request in, but that became to complicated and
>> since pretty much everything now a days supports UTF-8, we just decided to
>> stick with UTF-8.
>> 
>> 
>>> 2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
>>> like this example ), What is the priority to decode the message ?
>> 
>> It would have to be the HTTP header.    We SHOULD be able to call the
>> HttpServletRequest.getReader() method to get a reader that is setup with
>> the appropriate charset for the input stream.  (we don't do this, but per
>> spec we should be able to)     The contents of the stream (which is where
>> the xml decl would be found) would be irrelevant for this.
>> 
>> Dan
>> 
>> 
>>> I guess that encoding document is first one , but I'm not sure
>>> 
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 2013/3/13 Daniel Kulp <dk...@apache.org>
>>> 
>>>> 
>>>> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <el...@gmail.com> wrote:
>>>>> I´ve a problem with client encoding, when I read some element with
>>>> special characters in response I get bad characters like ��
>>>>> 
>>>>> The log in is:
>>>>> 
>>>>> INFO: Inbound Message
>>>>> ----------------------------
>>>>> ID: 1
>>>>> Response-Code: 200
>>>>> Encoding: ISO-8859-1
>>>>> Content-Type: text/xml
>>>>> Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
>>>> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
>>>> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
>>>>> Messages:
>>>>> Message (saved to tmp file):
>>>>> Filename:
>>>> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
>>>>> (message truncated to -1 bytes)
>>>>> 
>>>>> Payload: <?xml version="1.0" encoding="UTF-8"?>
>>>>> <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
>> "
>>>> ......
>>>>> 
>>>>> I think the problem is that there are two different encodings
>> "Encoding:
>>>> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
>>>>> Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
>>>> version="1.0" encoding="ISO-8859-1"?>?
>>>>> 
>>>>> Thanks and best regards.
>>>> 
>>>> Yea… that seems very wrong to me.  Seems like a bit of an invalid
>> message
>>>> as I'd expect the Content-Type to set a charset of utf-8.   I would
>> attempt
>>>> two things:
>>>> 
>>>> 1) Stick an interceptor on the incoming chain that would set:
>>>> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
>>>> UTF-8.
>>>> 
>>>> 2) You can try chaining the <?xml> header via an input stream filter or
>>>> similar.
>>>> 
>>>> 3) Remove the InputSteam from the message contents, wrapper it with an
>>>> InputStreamReader using whichever encoding works, and set that into the
>>>> message content as a Reader.class.   CXF will then delegate to that to
>>>> handle the charset stuff.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Daniel Kulp
>>>> dkulp@apache.org - http://dankulp.com/blog
>>>> Talend Community Coder - http://coders.talend.com
>>>> 
>>>> 
>> 
>> --
>> Daniel Kulp
>> dkulp@apache.org - http://dankulp.com/blog
>> Talend Community Coder - http://coders.talend.com
>> 
>> 

-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com


Re: Problem with web service client encoding

Posted by Jose María Zaragoza <de...@gmail.com>.
Thanks Daniel.

But there are some things that I dont understand

This is a log for a sending from a client by using CXF 2.7.3

Address: http://x.x.x.x:8080/services/WSHttpSoap11Endpoint/
Encoding: UTF-8
Http-Method: POST
Content-Type: text/xml
Headers: {Accept=[*/*], SOAPAction=["urn:process"]}
Payload: <soap:Envelope xmlns:soap="
http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns3:process xmlns="
http://bean.util.distribuidores.movistar/xsd" xmlns:ns2="http://bean/xsd"
xmlns:ns3="http://ws
"><ns3:in><ns2:data>YYYY</ns2:data></ns3:in></ns3:process></soap:Body></soap:Envelope>


As you can see,  Content-Type hasn't got a charset and Encoding is UTF-8.
It should be ISO- 8859-1 ,  shouldn't it  ?
So, I'm not sure that Encoding is the HTTP content encoding , right ?


Futhermore, XML payload (SOAP message ) hasn't got a  <?xml ... ?> header
 with encoding
I don't know if it makes any sense to have a XML encoding , because XML is
build by CXF runtime and it chooses the encoding that it prefers


Regards


2013/4/25 Daniel Kulp <dk...@apache.org>

>
> On Apr 24, 2013, at 6:32 PM, Jose María Zaragoza <de...@gmail.com>
> wrote:
>
> > Hello:
> >
> > I' looking this example and I'd like to understand some things:
> >
> > 1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
> > content charset ?
>
> Yea.  If you do a wireshark or similar to get the raw TCP bytes, this
> should be the charset in the Content-Type header.   If there is not a
> charset on the Content-Type header, the default (per HTTP spec) is
> ISO-8859-1 which may be where this value is coming from.
>
> > How does Apache CXF choose what is the HTTP header charset to return to a
> > client ?
>
> I think it will always use UTF-8 unless the user goes out of the way (and
> it's not easy) to change it.   I'd need to dig through some code to verify
> though.   At one point a long time ago, we did try to use the same charset
> that the client sent the request in, but that became to complicated and
> since pretty much everything now a days supports UTF-8, we just decided to
> stick with UTF-8.
>
>
> > 2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
> > like this example ), What is the priority to decode the message ?
>
> It would have to be the HTTP header.    We SHOULD be able to call the
> HttpServletRequest.getReader() method to get a reader that is setup with
> the appropriate charset for the input stream.  (we don't do this, but per
> spec we should be able to)     The contents of the stream (which is where
> the xml decl would be found) would be irrelevant for this.
>
> Dan
>
>
> > I guess that encoding document is first one , but I'm not sure
> >
> >
> > Thanks
> >
> >
> >
> > 2013/3/13 Daniel Kulp <dk...@apache.org>
> >
> >>
> >> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <el...@gmail.com> wrote:
> >>> I´ve a problem with client encoding, when I read some element with
> >> special characters in response I get bad characters like ��
> >>>
> >>> The log in is:
> >>>
> >>> INFO: Inbound Message
> >>> ----------------------------
> >>> ID: 1
> >>> Response-Code: 200
> >>> Encoding: ISO-8859-1
> >>> Content-Type: text/xml
> >>> Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
> >> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
> >> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
> >>> Messages:
> >>> Message (saved to tmp file):
> >>> Filename:
> >> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
> >>> (message truncated to -1 bytes)
> >>>
> >>> Payload: <?xml version="1.0" encoding="UTF-8"?>
> >>> <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
> "
> >> ......
> >>>
> >>> I think the problem is that there are two different encodings
> "Encoding:
> >> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
> >>> Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
> >> version="1.0" encoding="ISO-8859-1"?>?
> >>>
> >>> Thanks and best regards.
> >>
> >> Yea… that seems very wrong to me.  Seems like a bit of an invalid
> message
> >> as I'd expect the Content-Type to set a charset of utf-8.   I would
> attempt
> >> two things:
> >>
> >> 1) Stick an interceptor on the incoming chain that would set:
> >> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
> >> UTF-8.
> >>
> >> 2) You can try chaining the <?xml> header via an input stream filter or
> >> similar.
> >>
> >> 3) Remove the InputSteam from the message contents, wrapper it with an
> >> InputStreamReader using whichever encoding works, and set that into the
> >> message content as a Reader.class.   CXF will then delegate to that to
> >> handle the charset stuff.
> >>
> >>
> >>
> >>
> >> --
> >> Daniel Kulp
> >> dkulp@apache.org - http://dankulp.com/blog
> >> Talend Community Coder - http://coders.talend.com
> >>
> >>
>
> --
> Daniel Kulp
> dkulp@apache.org - http://dankulp.com/blog
> Talend Community Coder - http://coders.talend.com
>
>

Re: Problem with web service client encoding

Posted by Daniel Kulp <dk...@apache.org>.
On Apr 24, 2013, at 6:32 PM, Jose María Zaragoza <de...@gmail.com> wrote:

> Hello:
> 
> I' looking this example and I'd like to understand some things:
> 
> 1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
> content charset ?

Yea.  If you do a wireshark or similar to get the raw TCP bytes, this should be the charset in the Content-Type header.   If there is not a charset on the Content-Type header, the default (per HTTP spec) is ISO-8859-1 which may be where this value is coming from.

> How does Apache CXF choose what is the HTTP header charset to return to a
> client ?

I think it will always use UTF-8 unless the user goes out of the way (and it's not easy) to change it.   I'd need to dig through some code to verify though.   At one point a long time ago, we did try to use the same charset that the client sent the request in, but that became to complicated and since pretty much everything now a days supports UTF-8, we just decided to stick with UTF-8.


> 2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
> like this example ), What is the priority to decode the message ?

It would have to be the HTTP header.    We SHOULD be able to call the HttpServletRequest.getReader() method to get a reader that is setup with the appropriate charset for the input stream.  (we don't do this, but per spec we should be able to)     The contents of the stream (which is where the xml decl would be found) would be irrelevant for this.

Dan


> I guess that encoding document is first one , but I'm not sure
> 
> 
> Thanks
> 
> 
> 
> 2013/3/13 Daniel Kulp <dk...@apache.org>
> 
>> 
>> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <el...@gmail.com> wrote:
>>> I´ve a problem with client encoding, when I read some element with
>> special characters in response I get bad characters like ��
>>> 
>>> The log in is:
>>> 
>>> INFO: Inbound Message
>>> ----------------------------
>>> ID: 1
>>> Response-Code: 200
>>> Encoding: ISO-8859-1
>>> Content-Type: text/xml
>>> Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
>> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
>> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
>>> Messages:
>>> Message (saved to tmp file):
>>> Filename:
>> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
>>> (message truncated to -1 bytes)
>>> 
>>> Payload: <?xml version="1.0" encoding="UTF-8"?>
>>> <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> ......
>>> 
>>> I think the problem is that there are two different encodings "Encoding:
>> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
>>> Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
>> version="1.0" encoding="ISO-8859-1"?>?
>>> 
>>> Thanks and best regards.
>> 
>> Yea… that seems very wrong to me.  Seems like a bit of an invalid message
>> as I'd expect the Content-Type to set a charset of utf-8.   I would attempt
>> two things:
>> 
>> 1) Stick an interceptor on the incoming chain that would set:
>> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
>> UTF-8.
>> 
>> 2) You can try chaining the <?xml> header via an input stream filter or
>> similar.
>> 
>> 3) Remove the InputSteam from the message contents, wrapper it with an
>> InputStreamReader using whichever encoding works, and set that into the
>> message content as a Reader.class.   CXF will then delegate to that to
>> handle the charset stuff.
>> 
>> 
>> 
>> 
>> --
>> Daniel Kulp
>> dkulp@apache.org - http://dankulp.com/blog
>> Talend Community Coder - http://coders.talend.com
>> 
>> 

-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com