You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Simon Fell <so...@zaks.demon.co.uk> on 2005/06/01 05:39:39 UTC

Re: [jira] Commented: (AXIS-2025) Illegal XML characters in String arguments and return values cause XML exceptions in Axis calls

My point is that the character reference 0x03 is not valid XML,
regardless of whether its a raw byte or &3; both are illegal xml. The
only encoding you can do to transport that string is to base64 encode
it.

Cheers
Simon


On Tue, 31 May 2005 23:48:52 +0200 (CEST), in soap you wrote:

>     [ http://issues.apache.org/jira/browse/AXIS-2025?page=comments#action_66676 ]
>     
>Shankar Unni commented on AXIS-2025:
>------------------------------------
>
>Regarding Simon's comment: yes, 0x3 is not a valid XML character. That's exactly the point. The original *String* argument in the *Java* code had a 0x3. 
>
>The bug is that Axis is dumping that naked 0x3 on the wire as part of the text data in XML without escaping it, as in:
>
>  <badmsgReturn xsi:type="xsd:string">bad: .</badmsgReturn>
>
>The whole point of using an RPC library is to have it take *any legal Java String* in the API if the argument type says "String", and arrange to have it delivered on the other side intact. Not just Strings that contain valid XML characters.
>
>
>> Illegal XML characters in String arguments and return values cause XML exceptions in Axis calls
>> -----------------------------------------------------------------------------------------------
>>
>>          Key: AXIS-2025
>>          URL: http://issues.apache.org/jira/browse/AXIS-2025
>>      Project: Axis
>>         Type: Bug
>>   Components: Serialization/Deserialization
>>     Versions: 1.2
>>  Environment: All (but reproduced on WinXP).
>> Axis 1.1 and 1.2
>>     Reporter: Shankar Unni
>
>>
>> Arguments and return values of Java type String are incorrectly handled if they contain non-printing illegal ASCII characters.
>> Example 1: bad return values:
>> - - - - - - - - - - - - - - -
>> E.g. the string 
>>   "bad char: " + (char)3 + "."
>> Trivial example:
>> foo.jws:
>>   public class foo {
>>     public String badmsg()
>>     {
>>       return "bad: " + (char)3 + ".";
>>     }
>>   }
>> When calling this method and the server is running on Axis 1.1, it returns XML with the illegal character ASCII "3" in the text:
>>    <badmsgReturn xsi:type="xsd:string">bad: 
.</badmsgReturn>  
>> This causes an XML parse exception on the client side ("org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x3) was found in the element content of the document.")
>> With Axis 1.2, the server doesn't even return a valid response: I get an HTTP 200 OK with an empty content, causing a different XML parse error.
>> Example 2: bad parameter values:
>> - - - - - - - - - - - - - - - -
>> A similar problem exists when passing such a string from the the client side.
>> If I have a method in foo.jws:
>>   public class foo {
>>     public String echo(String s)
>>     {
>>       return s;
>>     }
>>   }
>> Then if I write an ordinary Java client to call this, and pass it a bad string as in the beginning of this post, I get an exception thrown while the call is being composed:
>> java.lang.IllegalArgumentException: The char '0x3' in 'bad char: ?.' is not a valid XML character.
>> This is somewhat absurd: shouldn't the serialization layer be encoding these illegal XML characters as entity escapes? They're entirely legal in the current locale (US), and normal Java code handles this character quite normally.  Why should it croak when passed by XML/RPC?