You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-dev@axis.apache.org by Douglas Bitting <Do...@agile.com> on 2004/02/19 21:00:28 UTC

REPOST: Character encoding problems sending UTF-8 back to client

I apologize if this has come across already, but I still haven't seen it on the mailing list after 18 hours.

All,

I can't really figure out if I'm doing something wrong here or if there is a defect involved.  Basically, I have a Japanese string that I'm attempting
to send back to the client.  However, when the client receives the string, it is mangled beyond repair.  I've put together a small test case, and
include it (and it's results here).

Here is the method that is invoked via Axis on the server:

   public String getString() {
      String str = "SDK \u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }
      return str;
   }

The output of this method is as follows:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 12364
char[10]: 35211
...

I generated client side stubs via WSDL2Java, and put together a quick client that simply does this:

      String str = stub.getString();
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

This emits the following:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 227
char[5]: 402
char[6]: 169
char[7]: 227
char[8]: 8218
...

The first 4 chars are returned properly, but everything after that is completely munged.

As near as I can tell, during serialization Axis is manually converting my string into a UTF-8 encoded byte array.  However, the inverse operation
does not appear to happen on the client side.  Am I doing something wrong here, or is this a defect?

Just for grins, I modified by client code to look like the following:

      String str = stub.getString();

      byte[] bytes = str.getBytes();
      str = new String(bytes, "UTF-8");

      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

The additional code attempts to reverse the manual encoding done within Axis; however, it is not entirely successful:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 65533
char[10]: 63

The first 8 chars are correct, but after that it goes downhill...

It's worth pointing out that the version of Axis I'm using is a few months old:

WSDL created by Apache Axis version: 1.2dev
Built on Aug 26, 2003 (12:11:48 PDT)

I'm hesitant to update at this point due to project time constraints, but will if I have to.  Has this scenario been addressed in the newer builds?

Thanks,
--Doug

Re: REPOST: Character encoding problems sending UTF-8 back to client

Posted by Davanum Srinivas <di...@yahoo.com>.

Please try latest CVS. i think there were some patches
(http://marc.theaimsgroup.com/?l=axis-dev&w=2&r=1&s=24896&q=b)

-- dims

--- Douglas Bitting <Do...@agile.com> wrote:
> I apologize if this has come across already, but I still haven't seen it on the mailing list
> after 18 hours.
> 
> All,
> 
> I can't really figure out if I'm doing something wrong here or if there is a defect involved. 
> Basically, I have a Japanese string that I'm attempting
> to send back to the client.  However, when the client receives the string, it is mangled beyond
> repair.  I've put together a small test case, and
> include it (and it's results here).
> 
> Here is the method that is invoked via Axis on the server:
> 
>    public String getString() {
>       String str = "SDK
>
\u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
>       return str;
>    }
> 
> The output of this method is as follows:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 12364
> char[10]: 35211
> ...
> 
> I generated client side stubs via WSDL2Java, and put together a quick client that simply does
> this:
> 
>       String str = stub.getString();
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
> 
> This emits the following:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 227
> char[5]: 402
> char[6]: 169
> char[7]: 227
> char[8]: 8218
> ...
> 
> The first 4 chars are returned properly, but everything after that is completely munged.
> 
> As near as I can tell, during serialization Axis is manually converting my string into a UTF-8
> encoded byte array.  However, the inverse operation
> does not appear to happen on the client side.  Am I doing something wrong here, or is this a
> defect?
> 
> Just for grins, I modified by client code to look like the following:
> 
>       String str = stub.getString();
> 
>       byte[] bytes = str.getBytes();
>       str = new String(bytes, "UTF-8");
> 
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
> 
> The additional code attempts to reverse the manual encoding done within Axis; however, it is not
> entirely successful:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 65533
> char[10]: 63
> 
> The first 8 chars are correct, but after that it goes downhill...
> 
> It's worth pointing out that the version of Axis I'm using is a few months old:
> 
> WSDL created by Apache Axis version: 1.2dev
> Built on Aug 26, 2003 (12:11:48 PDT)
> 
> I'm hesitant to update at this point due to project time constraints, but will if I have to. 
> Has this scenario been addressed in the newer builds?
> 
> Thanks,
> --Doug
> 


=====
Davanum Srinivas - http://webservices.apache.org/~dims/