You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Douglas Bitting <Do...@agile.com> on 2004/02/19 21:00:28 UTC
REPOST: Character encoding problems sending UTF-8 back to client
I apologize if this has come across already, but I still haven't seen it on the mailing list after 18 hours.
All,
I can't really figure out if I'm doing something wrong here or if there is a defect involved. Basically, I have a Japanese string that I'm attempting
to send back to the client. However, when the client receives the string, it is mangled beyond repair. I've put together a small test case, and
include it (and it's results here).
Here is the method that is invoked via Axis on the server:
public String getString() {
String str = "SDK \u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
return str;
}
The output of this method is as follows:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 12364
char[10]: 35211
...
I generated client side stubs via WSDL2Java, and put together a quick client that simply does this:
String str = stub.getString();
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
This emits the following:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 227
char[5]: 402
char[6]: 169
char[7]: 227
char[8]: 8218
...
The first 4 chars are returned properly, but everything after that is completely munged.
As near as I can tell, during serialization Axis is manually converting my string into a UTF-8 encoded byte array. However, the inverse operation
does not appear to happen on the client side. Am I doing something wrong here, or is this a defect?
Just for grins, I modified by client code to look like the following:
String str = stub.getString();
byte[] bytes = str.getBytes();
str = new String(bytes, "UTF-8");
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
The additional code attempts to reverse the manual encoding done within Axis; however, it is not entirely successful:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 65533
char[10]: 63
The first 8 chars are correct, but after that it goes downhill...
It's worth pointing out that the version of Axis I'm using is a few months old:
WSDL created by Apache Axis version: 1.2dev
Built on Aug 26, 2003 (12:11:48 PDT)
I'm hesitant to update at this point due to project time constraints, but will if I have to. Has this scenario been addressed in the newer builds?
Thanks,
--Doug
Re: REPOST: Character encoding problems sending UTF-8 back to client
Posted by Davanum Srinivas <di...@yahoo.com>.
Please try latest CVS. i think there were some patches
(http://marc.theaimsgroup.com/?l=axis-dev&w=2&r=1&s=24896&q=b)
-- dims
--- Douglas Bitting <Do...@agile.com> wrote:
> I apologize if this has come across already, but I still haven't seen it on the mailing list
> after 18 hours.
>
> All,
>
> I can't really figure out if I'm doing something wrong here or if there is a defect involved.
> Basically, I have a Japanese string that I'm attempting
> to send back to the client. However, when the client receives the string, it is mangled beyond
> repair. I've put together a small test case, and
> include it (and it's results here).
>
> Here is the method that is invoked via Axis on the server:
>
> public String getString() {
> String str = "SDK
>
\u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
> for (int ii = 0; ii < str.length(); ii++) {
> System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
> }
> return str;
> }
>
> The output of this method is as follows:
>
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 12364
> char[10]: 35211
> ...
>
> I generated client side stubs via WSDL2Java, and put together a quick client that simply does
> this:
>
> String str = stub.getString();
> for (int ii = 0; ii < str.length(); ii++) {
> System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
> }
>
> This emits the following:
>
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 227
> char[5]: 402
> char[6]: 169
> char[7]: 227
> char[8]: 8218
> ...
>
> The first 4 chars are returned properly, but everything after that is completely munged.
>
> As near as I can tell, during serialization Axis is manually converting my string into a UTF-8
> encoded byte array. However, the inverse operation
> does not appear to happen on the client side. Am I doing something wrong here, or is this a
> defect?
>
> Just for grins, I modified by client code to look like the following:
>
> String str = stub.getString();
>
> byte[] bytes = str.getBytes();
> str = new String(bytes, "UTF-8");
>
> for (int ii = 0; ii < str.length(); ii++) {
> System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
> }
>
> The additional code attempts to reverse the manual encoding done within Axis; however, it is not
> entirely successful:
>
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 65533
> char[10]: 63
>
> The first 8 chars are correct, but after that it goes downhill...
>
> It's worth pointing out that the version of Axis I'm using is a few months old:
>
> WSDL created by Apache Axis version: 1.2dev
> Built on Aug 26, 2003 (12:11:48 PDT)
>
> I'm hesitant to update at this point due to project time constraints, but will if I have to.
> Has this scenario been addressed in the newer builds?
>
> Thanks,
> --Doug
>
=====
Davanum Srinivas - http://webservices.apache.org/~dims/