You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Andre John Mas <aj...@newtradetech.com> on 2003/04/10 16:40:43 UTC
UTF-8 end to end - what am I doing wrong?
Hi,
I am trying to create a solution which requires a SOAP message to be
sent from one party to another, in UTF-8. The set up is as follows:
- Tomcat 4.1.18 at the server end, on MS-Win2k
- Apache Commons HttpClient at the client end, on MS-Win2k
- JDK 1.3.1
At the server end I have a servlet running, that extends the
JAXMServlet. Since we are required to be UTF-8 compliant any stress
tests will involve Latin, Greek, Cyrillic and Japanese characters
being sent through. On windows I have installed all the possible
language sets to have access to as many 'alphabets' as possible.
On the client end the HTTPClient sends the document through as a POST
with the following content-type:
text/xml; charset=UTF-8
Now when I receive the document, where there should have been accents
and other non-Roman characters (this includes characters with accents)
I just get question marks. My first analysis suggested that maybe
JAXMServlet is at fault. Over-riding the doPost method I still get
mangled characters. If I send both the orginal text to file, before
sending and the received text, I find that in the first case I get
UTF-8 characters than appear nicely when viewed with Mozilla and in
the second case either question marks or mangled characters, depending
on whether I specifiy "UTF-8" in the OutputStreamWriter.
BTW When I use tcpmon, from Axis ( see xml.apache.org ) I see accented
characters appear to be on the stream. I don't see the japanese
characters, but that may be because the font used does not include
them (Courrier new).
Does anyone have any solutions that they could suggest.
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: UTF-8 end to end - what am I doing wrong?
Posted by Andre John Mas <aj...@newtradetech.com>.
Further investigation shows that my problem is probably with the
Apache Commons HttpClient. Using an equivalent approach with
Java's URLConnection the data arrives uncorrupted. I will
continue the investigation on the HttpClient mailing list.
The following code, using Java's URLConnection, works at the
client end:
public String send(URL destinationUrl, int timeout, String message)
throws Exception
{
//try
URLConnection connection = destinationUrl.openConnection();
connection.setRequestProperty("Content-type","text/xml;
charset=UTF-8");
connection.setRequestProperty("user-agent", "myAgent");
connection.setDoInput(true);
connection.setDoOutput(true);
OutputStream out = connection.getOutputStream();
OutputStreamWriter outw = new OutputStreamWriter(out,"utf-8");
outw.write(message);
outw.flush();
InputStream in = connection.getInputStream();
InputStreamReader inr = new InputStreamReader(in,"utf-8");
BufferedReader br = new BufferedReader(inr);
StringBuffer strBuf = new StringBuffer();
String line = null;
while ( (line = br.readLine()) != null ) {
strBuf.append(line);
strBuf.append('\n');
}
return strBuf.toString();
}
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org