You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Robert Schiele <rs...@uni-mannheim.de> on 2003/01/22 21:22:37 UTC

Bug or Feature: unescaping & in href attribute while in HTML output method

Hi.

I am not sure whether the following is correct and such intended
behaviour, or it is a bug in Xalan-c 1.4:

Take the following XSLT script:

---
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>
  
  <xsl:template match="/">
    <a href="&amp;"/>  
  </xsl:template>
</xsl:stylesheet>
---

Take any XML file and run Xalan on them.

You get:

# Xalan xalanbug.xml xalanbug.xsl
<a href="&"></a>
#

As c/src/XMLSupport/FormatterToHTML.cpp says:

        // http://www.ietf.org/rfc/rfc2396.txt says:
        // A URI is always in an "escaped" form, since escaping or unescaping a
        // completed URI might change its semantics.  Normally, the only time
        // escape encodings can safely be made is when the URI is being created
        // from its component parts; each component may have its own set of
        // characters that are reserved, so only the mechanism responsible for
        // generating or interpreting that component can determine whether or
        // not escaping a character will change its semantics. Likewise, a URI
        // must be separated into its components before the escaped characters
        // within those components can be safely decoded.
        //
        // ...So we do our best to do limited escaping of the URL, without
        // causing damage.      If the URL is already properly escaped, in theory, this
        // function should not change the string value.

I would have expected "&amp;" to stay in escaped mode, but this is not
the case.

So the question is for me, which behaviour is correct?  The one
Xalan-c 1.4 works or the one not touching the URI?

Robert

-- 
Robert Schiele			Tel.: +49-621-181-2517
Dipl.-Wirtsch.informatiker	mailto:rschiele@uni-mannheim.de

Re: Bug or Feature: unescaping & in href attribute while in HTML output method

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.



Hi Robert,

For a long time, Xalan-C++ followed the behavior of Xalan-J, which is the
behavior you're seeing with Xalan-C++ 1.4.  The latest CVS code escapes the
&, breaking with the Xalan-J behavior.

Although this is definitely a bug, I'm guessing Xalan-J does this for
compatibility with broken browsers.  The next version of Xalan-C will fix
this bug.

Dave



                                                                                                                            
                      Robert Schiele                                                                                        
                      <rschiele@uni-ma         To:      xalan-dev <xa...@xml.apache.org>                                
                      nnheim.de>               cc:      (bcc: David N Bertoni/Cambridge/IBM)                                
                                               Subject: Bug or Feature: unescaping &amp; in href attribute while in HTML    
                      01/22/2003 12:22         output method                                                                
                      PM                                                                                                    
                      Please respond                                                                                        
                      to xalan-dev                                                                                          
                                                                                                                            



Hi.

I am not sure whether the following is correct and such intended
behaviour, or it is a bug in Xalan-c 1.4:

Take the following XSLT script:

---
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>

  <xsl:template match="/">
    <a href="&amp;"/>
  </xsl:template>
</xsl:stylesheet>
---

Take any XML file and run Xalan on them.

You get:

# Xalan xalanbug.xml xalanbug.xsl
<a href="&"></a>
#

As c/src/XMLSupport/FormatterToHTML.cpp says:

        // http://www.ietf.org/rfc/rfc2396.txt says:
        // A URI is always in an "escaped" form, since escaping or
unescaping a
        // completed URI might change its semantics.  Normally, the only
time
        // escape encodings can safely be made is when the URI is being
created
        // from its component parts; each component may have its own set of
        // characters that are reserved, so only the mechanism responsible
for
        // generating or interpreting that component can determine whether
or
        // not escaping a character will change its semantics. Likewise, a
URI
        // must be separated into its components before the escaped
characters
        // within those components can be safely decoded.
        //
        // ...So we do our best to do limited escaping of the URL, without
        // causing damage.      If the URL is already properly escaped, in
theory, this
        // function should not change the string value.

I would have expected "&amp;" to stay in escaped mode, but this is not
the case.

So the question is for me, which behaviour is correct?  The one
Xalan-c 1.4 works or the one not touching the URI?

Robert

--
Robert Schiele                                   Tel.: +49-621-181-2517
Dipl.-Wirtsch.informatiker           mailto:rschiele@uni-mannheim.de
(See attached file: attpvhk1.dat)

Re: Bug or Feature: unescaping & in href attribute while in HTML output method

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.



Hi Joe,

I disagree -- this is a bug, and I've stated this many times before.  See:

   http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

I think, historically, HTTP servers expected an un-escaped &, so tools
which generate HTML have tried to accomodate this.  But it _is_ a bug.

Newer HTTP servers should do the right thing.

Dave



                                                                                                                              
                      Joseph Kesselman                                                                                        
                      <keshlam@us.ibm.         To:      xalan-dev@xml.apache.org                                              
                      com>                     cc:      (bcc: David N Bertoni/Cambridge/IBM)                                  
                                               Subject: Re: Bug or Feature: unescaping &amp; in href attribute while in HTML  
                      01/22/2003 01:20         output method                                                                  
                      PM                                                                                                      
                      Please respond                                                                                          
                      to xalan-dev                                                                                            
                                                                                                                              




I believe that's correct operation. Xalan-J produces the same result.

When generating HTML output, attributes known to represent URIs are escaped
AS PER HTML'S CONVENTION FOR URIS, which is not the same as XML escaping.
Remember, HTML is not yet XML-based; it's SGML-based, and SGML allows some
things that XML doesn't.

______________________________________
Joe Kesselman  / IBM Research


Re: Bug or Feature: unescaping & in href attribute while in HTML output method

Posted by Joseph Kesselman <ke...@us.ibm.com>.
I believe that's correct operation. Xalan-J produces the same result.

When generating HTML output, attributes known to represent URIs are 
escaped AS PER HTML'S CONVENTION FOR URIS, which is not the same as XML 
escaping. Remember, HTML is not yet XML-based; it's SGML-based, and SGML 
allows some things that XML doesn't.

______________________________________
Joe Kesselman  / IBM Research