You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Bill Keese <bi...@tech.beacon-it.co.jp> on 2005/02/16 06:07:32 UTC

Re: UTF8Encoder question...

+1

Please change UTF8Encoder() back, so it doesn't encode characters above
0x7f. It doesn't matter much for European languages, but for Asian
language text, *every character is getting encoded*, which means that
message size has doubled or tripled. And both the server and client
spend much more time encoding/decoding.

Backwards compatibility really doesn't seem like an issue. According to
the XML spec, encoded and unencoded characters are the same, so this
change is transparent to Axis clients and to Axis server code. Normal
code sees the unencoded characters in either case. The only difference
is a few testcases that look directly at XML sent over the wire.

>Human readability is one of essenses in XML (and SOAP)...
>
Actually, I think the unencoded message is MORE readable. For example,
with a Japanese message:
1. If you don't speak Japanese, you can't read the message in any case,
encoded or not.
2. If you do speak Japanese, it's much easier to read characters than a
bunch of numbers.


Bill

(See message
http://marc.theaimsgroup.com/?l=axis-dev&m=110423515027651&w=2 for the
code change.)

PS: If you are really concerned about a backwards compatibility issue,
you should rename the misnamed UTF8Encoder.java to AsciiEncoder.java
(since it currently encodes all messages into Ascii), and then make that
the default.