You are viewing a plain text version of this content. The canonical link for it is here.
Posted to rpc-dev@xml.apache.org by Jim Redman <ji...@ergotech.com> on 2002/03/01 01:14:43 UTC

Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

 From a position of complete ignorance, and without reading the spec. can 
someone clarify for me what this means?

The part of the spec quoted here says that you can't used binary chars, 
however "&#0005" seems to meet the requirement of the spec in that it 
matches "the production for Char".

Regardless of whether xmlrpc should be doing the escaping, it seems to me 
that I should be able to send a String that would be invalid XML by 
escaping each character - the resulting string seems to be valid.  This is 
philosphically the same as sending new line characters as "\\n" in human 
readable strings.

Is this the case or is it required that anything that may have a non-Char 
in it be base64 encoded?

Jim

On 2002.02.28 16:48:25 -0700 bugzilla@apache.org wrote:
> Comments from John Wilson <tu...@wilson.co.uk>, author of the MinML parser:
> 
> "
> This isn't a bug. You just can't legally have a Unicode character with
> the
> value 5 in a well formed XML document. Escaping it as &#0005; makes no
> difference.
> 
> The relevant part of the spec is Section 4.1 Character and Entity
> References
> "Well-Formedness Constraint: Legal Character
> Characters referred to using character references must match the
> production
> for Char. "
> 
> MinML currently and erroneously allows this - I'm in process of
> tightening
> it's checking and it will soon reject it.
> "
> 
-- 

Jim Redman
(505) 662 5156
http://www.ergotech.com


Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Jim Redman" <ji...@ergotech.com>
To: <rp...@xml.apache.org>
Sent: Friday, March 01, 2002 12:14 AM
Subject: Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough
characters


> From a position of complete ignorance, and without reading the spec. can
> someone clarify for me what this means?
>
> The part of the spec quoted here says that you can't used binary chars,
> however "&#0005" seems to meet the requirement of the spec in that it
> matches "the production for Char".

No the spec says that the result of turning the escaped character back to a
Unicode character must match the production Char. This control character
does not match that prduction.

There is an excellent annotated spec here
http://www.xml.com/axml/testaxml.htm

>
> Regardless of whether xmlrpc should be doing the escaping, it seems to me
> that I should be able to send a String that would be invalid XML by
> escaping each character - the resulting string seems to be valid.  This is
> philosphically the same as sending new line characters as "\\n" in human
> readable strings.
>
> Is this the case or is it required that anything that may have a non-Char
> in it be base64 encoded?

If you want to send control characters (other than /r /n /t - and an XML
parser will mangle the /r and /n) then you need to use Base64 and agree a
Unicode encoding between the end points.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Jim Redman" <ji...@ergotech.com>
To: <rp...@xml.apache.org>
Sent: Friday, March 01, 2002 12:14 AM
Subject: Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough
characters


> From a position of complete ignorance, and without reading the spec. can
> someone clarify for me what this means?
>
> The part of the spec quoted here says that you can't used binary chars,
> however "&#0005" seems to meet the requirement of the spec in that it
> matches "the production for Char".

No the spec says that the result of turning the escaped character back to a
Unicode character must match the production Char. This control character
does not match that prduction.

There is an excellent annotated spec here
http://www.xml.com/axml/testaxml.htm

>
> Regardless of whether xmlrpc should be doing the escaping, it seems to me
> that I should be able to send a String that would be invalid XML by
> escaping each character - the resulting string seems to be valid.  This is
> philosphically the same as sending new line characters as "\\n" in human
> readable strings.
>
> Is this the case or is it required that anything that may have a non-Char
> in it be base64 encoded?

If you want to send control characters (other than /r /n /t - and an XML
parser will mangle the /r and /n) then you need to use Base64 and agree a
Unicode encoding between the end points.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk