You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Robert Schiele <rs...@uni-mannheim.de> on 2003/01/22 21:22:37 UTC
Bug or Feature: unescaping & in href attribute while in HTML output method
Hi.
I am not sure whether the following is correct and such intended
behaviour, or it is a bug in Xalan-c 1.4:
Take the following XSLT script:
---
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<a href="&"/>
</xsl:template>
</xsl:stylesheet>
---
Take any XML file and run Xalan on them.
You get:
# Xalan xalanbug.xml xalanbug.xsl
<a href="&"></a>
#
As c/src/XMLSupport/FormatterToHTML.cpp says:
// http://www.ietf.org/rfc/rfc2396.txt says:
// A URI is always in an "escaped" form, since escaping or unescaping a
// completed URI might change its semantics. Normally, the only time
// escape encodings can safely be made is when the URI is being created
// from its component parts; each component may have its own set of
// characters that are reserved, so only the mechanism responsible for
// generating or interpreting that component can determine whether or
// not escaping a character will change its semantics. Likewise, a URI
// must be separated into its components before the escaped characters
// within those components can be safely decoded.
//
// ...So we do our best to do limited escaping of the URL, without
// causing damage. If the URL is already properly escaped, in theory, this
// function should not change the string value.
I would have expected "&" to stay in escaped mode, but this is not
the case.
So the question is for me, which behaviour is correct? The one
Xalan-c 1.4 works or the one not touching the URI?
Robert
--
Robert Schiele Tel.: +49-621-181-2517
Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de
Re: Bug or Feature: unescaping & in href attribute while in HTML output
method
Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.
Hi Robert,
For a long time, Xalan-C++ followed the behavior of Xalan-J, which is the
behavior you're seeing with Xalan-C++ 1.4. The latest CVS code escapes the
&, breaking with the Xalan-J behavior.
Although this is definitely a bug, I'm guessing Xalan-J does this for
compatibility with broken browsers. The next version of Xalan-C will fix
this bug.
Dave
Robert Schiele
<rschiele@uni-ma To: xalan-dev <xa...@xml.apache.org>
nnheim.de> cc: (bcc: David N Bertoni/Cambridge/IBM)
Subject: Bug or Feature: unescaping & in href attribute while in HTML
01/22/2003 12:22 output method
PM
Please respond
to xalan-dev
Hi.
I am not sure whether the following is correct and such intended
behaviour, or it is a bug in Xalan-c 1.4:
Take the following XSLT script:
---
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<a href="&"/>
</xsl:template>
</xsl:stylesheet>
---
Take any XML file and run Xalan on them.
You get:
# Xalan xalanbug.xml xalanbug.xsl
<a href="&"></a>
#
As c/src/XMLSupport/FormatterToHTML.cpp says:
// http://www.ietf.org/rfc/rfc2396.txt says:
// A URI is always in an "escaped" form, since escaping or
unescaping a
// completed URI might change its semantics. Normally, the only
time
// escape encodings can safely be made is when the URI is being
created
// from its component parts; each component may have its own set of
// characters that are reserved, so only the mechanism responsible
for
// generating or interpreting that component can determine whether
or
// not escaping a character will change its semantics. Likewise, a
URI
// must be separated into its components before the escaped
characters
// within those components can be safely decoded.
//
// ...So we do our best to do limited escaping of the URL, without
// causing damage. If the URL is already properly escaped, in
theory, this
// function should not change the string value.
I would have expected "&" to stay in escaped mode, but this is not
the case.
So the question is for me, which behaviour is correct? The one
Xalan-c 1.4 works or the one not touching the URI?
Robert
--
Robert Schiele Tel.: +49-621-181-2517
Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de
(See attached file: attpvhk1.dat)
Re: Bug or Feature: unescaping & in href attribute while in HTML output
method
Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.
Hi Joe,
I disagree -- this is a bug, and I've stated this many times before. See:
http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
I think, historically, HTTP servers expected an un-escaped &, so tools
which generate HTML have tried to accomodate this. But it _is_ a bug.
Newer HTTP servers should do the right thing.
Dave
Joseph Kesselman
<keshlam@us.ibm. To: xalan-dev@xml.apache.org
com> cc: (bcc: David N Bertoni/Cambridge/IBM)
Subject: Re: Bug or Feature: unescaping & in href attribute while in HTML
01/22/2003 01:20 output method
PM
Please respond
to xalan-dev
I believe that's correct operation. Xalan-J produces the same result.
When generating HTML output, attributes known to represent URIs are escaped
AS PER HTML'S CONVENTION FOR URIS, which is not the same as XML escaping.
Remember, HTML is not yet XML-based; it's SGML-based, and SGML allows some
things that XML doesn't.
______________________________________
Joe Kesselman / IBM Research
Re: Bug or Feature: unescaping & in href attribute while in HTML output
method
Posted by Joseph Kesselman <ke...@us.ibm.com>.
I believe that's correct operation. Xalan-J produces the same result.
When generating HTML output, attributes known to represent URIs are
escaped AS PER HTML'S CONVENTION FOR URIS, which is not the same as XML
escaping. Remember, HTML is not yet XML-based; it's SGML-based, and SGML
allows some things that XML doesn't.
______________________________________
Joe Kesselman / IBM Research