You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xmlrpc-dev@ws.apache.org by bu...@apache.org on 2002/03/01 00:48:25 UTC

DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6763>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6763

XMLWriter doesn't escape enough characters

dlr@finemaltcoding.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From dlr@finemaltcoding.com  2002-02-28 23:48 -------
Comments from John Wilson <tu...@wilson.co.uk>, author of the MinML parser:

"
This isn't a bug. You just can't legally have a Unicode character with the 
value 5 in a well formed XML document. Escaping it as &#0005; makes no 
difference. 
 
The relevant part of the spec is Section 4.1 Character and Entity References 
"Well-Formedness Constraint: Legal Character 
Characters referred to using character references must match the production 
for Char. " 
 
MinML currently and erroneously allows this - I'm in process of tightening 
it's checking and it will soon reject it. 
"

Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Jim Redman" <ji...@ergotech.com>
To: <rp...@xml.apache.org>
Sent: Friday, March 01, 2002 12:14 AM
Subject: Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough
characters


> From a position of complete ignorance, and without reading the spec. can
> someone clarify for me what this means?
>
> The part of the spec quoted here says that you can't used binary chars,
> however "&#0005" seems to meet the requirement of the spec in that it
> matches "the production for Char".

No the spec says that the result of turning the escaped character back to a
Unicode character must match the production Char. This control character
does not match that prduction.

There is an excellent annotated spec here
http://www.xml.com/axml/testaxml.htm

>
> Regardless of whether xmlrpc should be doing the escaping, it seems to me
> that I should be able to send a String that would be invalid XML by
> escaping each character - the resulting string seems to be valid.  This is
> philosphically the same as sending new line characters as "\\n" in human
> readable strings.
>
> Is this the case or is it required that anything that may have a non-Char
> in it be base64 encoded?

If you want to send control characters (other than /r /n /t - and an XML
parser will mangle the /r and /n) then you need to use Base64 and agree a
Unicode encoding between the end points.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Jim Redman" <ji...@ergotech.com>
To: <rp...@xml.apache.org>
Sent: Friday, March 01, 2002 12:14 AM
Subject: Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough
characters


> From a position of complete ignorance, and without reading the spec. can
> someone clarify for me what this means?
>
> The part of the spec quoted here says that you can't used binary chars,
> however "&#0005" seems to meet the requirement of the spec in that it
> matches "the production for Char".

No the spec says that the result of turning the escaped character back to a
Unicode character must match the production Char. This control character
does not match that prduction.

There is an excellent annotated spec here
http://www.xml.com/axml/testaxml.htm

>
> Regardless of whether xmlrpc should be doing the escaping, it seems to me
> that I should be able to send a String that would be invalid XML by
> escaping each character - the resulting string seems to be valid.  This is
> philosphically the same as sending new line characters as "\\n" in human
> readable strings.
>
> Is this the case or is it required that anything that may have a non-Char
> in it be base64 encoded?

If you want to send control characters (other than /r /n /t - and an XML
parser will mangle the /r and /n) then you need to use Base64 and agree a
Unicode encoding between the end points.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by Jim Redman <ji...@ergotech.com>.
 From a position of complete ignorance, and without reading the spec. can 
someone clarify for me what this means?

The part of the spec quoted here says that you can't used binary chars, 
however "&#0005" seems to meet the requirement of the spec in that it 
matches "the production for Char".

Regardless of whether xmlrpc should be doing the escaping, it seems to me 
that I should be able to send a String that would be invalid XML by 
escaping each character - the resulting string seems to be valid.  This is 
philosphically the same as sending new line characters as "\\n" in human 
readable strings.

Is this the case or is it required that anything that may have a non-Char 
in it be base64 encoded?

Jim

On 2002.02.28 16:48:25 -0700 bugzilla@apache.org wrote:
> Comments from John Wilson <tu...@wilson.co.uk>, author of the MinML parser:
> 
> "
> This isn't a bug. You just can't legally have a Unicode character with
> the
> value 5 in a well formed XML document. Escaping it as &#0005; makes no
> difference.
> 
> The relevant part of the spec is Section 4.1 Character and Entity
> References
> "Well-Formedness Constraint: Legal Character
> Characters referred to using character references must match the
> production
> for Char. "
> 
> MinML currently and erroneously allows this - I'm in process of
> tightening
> it's checking and it will soon reject it.
> "
> 
-- 

Jim Redman
(505) 662 5156
http://www.ergotech.com


Re: DO NOT REPLY [Bug 6763] - XMLWriter doesn't escape enough characters

Posted by Jim Redman <ji...@ergotech.com>.
 From a position of complete ignorance, and without reading the spec. can 
someone clarify for me what this means?

The part of the spec quoted here says that you can't used binary chars, 
however "&#0005" seems to meet the requirement of the spec in that it 
matches "the production for Char".

Regardless of whether xmlrpc should be doing the escaping, it seems to me 
that I should be able to send a String that would be invalid XML by 
escaping each character - the resulting string seems to be valid.  This is 
philosphically the same as sending new line characters as "\\n" in human 
readable strings.

Is this the case or is it required that anything that may have a non-Char 
in it be base64 encoded?

Jim

On 2002.02.28 16:48:25 -0700 bugzilla@apache.org wrote:
> Comments from John Wilson <tu...@wilson.co.uk>, author of the MinML parser:
> 
> "
> This isn't a bug. You just can't legally have a Unicode character with
> the
> value 5 in a well formed XML document. Escaping it as &#0005; makes no
> difference.
> 
> The relevant part of the spec is Section 4.1 Character and Entity
> References
> "Well-Formedness Constraint: Legal Character
> Characters referred to using character references must match the
> production
> for Char. "
> 
> MinML currently and erroneously allows this - I'm in process of
> tightening
> it's checking and it will soon reject it.
> "
> 
-- 

Jim Redman
(505) 662 5156
http://www.ergotech.com