You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cxf.apache.org by melix <ce...@lingway.com> on 2007/11/13 14:43:00 UTC

Encodings messup

Hi,

I'm using CXF with JAX-WS, and I do have problems/misunterstandings about
character encodings. Here's my problem. Basically :

1. Server may run on different platforms (Linux, Windows, ...) for which
default charset encodings are different
2. clients are not necessarily written in Java (I do have one for tests, but
also Perl clients)
3. clients run on different platforms with different default encodings
4. @WebMethod String getResult(String aQuestion);
5. both result and question are strings which are actually XML (ex :
<question>how are you</question> -> <answer><robot>fine, thank you
!</robot></answer>)
6. XML tells its own encoding (through <?xml ... encoding='iso-8859-1'?>)
for example

So now :

- at runtime, in which encoding do I get the "question" ? I use XOM for XML
parsing, but (it's not a problem) it takes an InputStream, so I do have to
create an inputstream from the question. I'm not really sure that :
       InputStream in = new BufferedInputStream(new
ByteArrayInputStream(question.getBytes()));
does the trick because getBytes() will assume that the string is encoded in
the system encoding (which may not be true). Futhermore, will it clash with
the fact that the string *represents* an XML in another encoding (question
may not be expressed in the encoding of the client)
- I use a ByteArrayOutputStream and a XOM Serializer for generating an XML
response, but I do have to convert it to a String in order to return it to
the client. If I do a return out.getBytes("iso-8859-1"), the result string
will be, according to my understanding of Java, an UTF-16 encoded string : I
just tell the compiler that my bytearrayoutputstream contains bytes in
iso-8859-1. Then, the server will send the string to the client using the
platform default charset, so either utf-8 (linux) or iso-8859-1 (windows).
This could be ok, but I need to be sure that the actual string returned to
the client will be an ISO-8859-1 string according to what my result XML
header says...

I'm feeling the headache close, so any expert help would really be
appreciated !

-- 
View this message in context: http://www.nabble.com/Encodings-messup-tf4797821.html#a13726051
Sent from the cxf-user mailing list archive at Nabble.com.


RE: Encodings messup

Posted by Benson Margulies <bi...@basistech.com>.
I'm not quite following the question here. If you use CXF or a similiar
kit for client and server, it will ship everything everywhere in UTF-8.

What are you using for a client?
 

> -----Original Message-----
> From: melix [mailto:cedric.champeau@lingway.com] 
> Sent: Tuesday, November 13, 2007 8:43 AM
> To: cxf-user@incubator.apache.org
> Subject: Encodings messup
> 
> 
> Hi,
> 
> I'm using CXF with JAX-WS, and I do have 
> problems/misunterstandings about character encodings. Here's 
> my problem. Basically :
> 
> 1. Server may run on different platforms (Linux, Windows, 
> ...) for which default charset encodings are different 2. 
> clients are not necessarily written in Java (I do have one 
> for tests, but also Perl clients) 3. clients run on different 
> platforms with different default encodings 4. @WebMethod 
> String getResult(String aQuestion); 5. both result and 
> question are strings which are actually XML (ex :
> <question>how are you</question> -> <answer><robot>fine, thank you
> !</robot></answer>)
> 6. XML tells its own encoding (through <?xml ... 
> encoding='iso-8859-1'?>) for example
> 
> So now :
> 
> - at runtime, in which encoding do I get the "question" ? I 
> use XOM for XML parsing, but (it's not a problem) it takes an 
> InputStream, so I do have to create an inputstream from the 
> question. I'm not really sure that :
>        InputStream in = new BufferedInputStream(new 
> ByteArrayInputStream(question.getBytes()));
> does the trick because getBytes() will assume that the string 
> is encoded in the system encoding (which may not be true). 
> Futhermore, will it clash with the fact that the string 
> *represents* an XML in another encoding (question may not be 
> expressed in the encoding of the client)
> - I use a ByteArrayOutputStream and a XOM Serializer for 
> generating an XML response, but I do have to convert it to a 
> String in order to return it to the client. If I do a return 
> out.getBytes("iso-8859-1"), the result string will be, 
> according to my understanding of Java, an UTF-16 encoded 
> string : I just tell the compiler that my 
> bytearrayoutputstream contains bytes in iso-8859-1. Then, the 
> server will send the string to the client using the platform 
> default charset, so either utf-8 (linux) or iso-8859-1 (windows).
> This could be ok, but I need to be sure that the actual 
> string returned to the client will be an ISO-8859-1 string 
> according to what my result XML header says...
> 
> I'm feeling the headache close, so any expert help would 
> really be appreciated !
> 
> --
> View this message in context: 
> http://www.nabble.com/Encodings-messup-tf4797821.html#a13726051
> Sent from the cxf-user mailing list archive at Nabble.com.
> 
>