You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Angus McIntyre <an...@pobox.com> on 2003/01/28 20:19:59 UTC

HTML output method and & in URLs

I have a stylesheet processor based on Xalan and Ant which I'm using 
to generate HTML pages from XML. Within my pages, I have some URL 
strings containing arguments, separated by '&'. In the input 
document, the form is:

	arg1=foo&amp;arg2=bar&amp;arg3=baz

The final HTML output contains the string

	arg1=foo&arg2=bar&arg3=baz

which fails validation as HTML, because it uses '&' rather than '&amp;'.

My stylesheet defines the output method as:

   <xsl:output method="html"
     doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
     doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"
     xalan:omit-meta-tag="yes"/>

If I change the method to 'xml', the '&amp;' entities are not 
converted, so it's presumably the HTML conversion process that is 
doing this. Setting:

	xalan:use-url-escaping="no"

doesn't seem to fix the problem.

Is there any way around this, or am I going to have to hack my 
processor to reencode the '&' characters as entities?

Thanks

	Angus
-- 
angus@pobox.com                             http://pobox.com/~angus

Re: HTML output method and & in URLs

Posted by Herr Christian Wolfgang Hujer <Ch...@itcqis.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dan,

Am Dienstag, 28. Januar 2003 20:34 schrieb Dan Jacobs:
> Hi Angus,
> As far as I can tell, when you extract an entity-encoded String from
> your XML document, the entities are translated, and then you have your
> String.  If you then include that String in the generated output, you
> have to re-entity-encode it yourself.

No, that's not required, regardless wether output method is xml or html. The 
output must be well-formed SGML / HTML and if the transformation would 
basically generate valid HTML the output must also be valid. Not encoding & 
as &amp; is a violation of the XSLT and the HTML specs.
See XSLT section 16.2 HTML Output Method and HTML 4 section B.2.1 / B.2.2.

Bye
- -- 
ITCQIS GmbH
Christian Wolfgang Hujer
Geschäftsführender Gesellschafter
Telefon: +49  (0)89  27 37 04 37
Telefax: +49  (0)89  27 37 04 39
E-Mail: Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE+NuWqzu6h7O/MKZkRAk1eAJ4z9vbJE2D45aQNzME3IUwOGOaMuwCaAoPf
+wJgff5EopbWh4cNyBuaY1Y=
=RufC
-----END PGP SIGNATURE-----


RE: HTML output method and & in URLs

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.



Hi Angus,

Fascinating.  An incorrect answer which implies that using XSLT reduces a
user's sanity and flogs software all in the same message.

This is simply a bug in Xalan-J.  The processor must serialize attributes
so the result is well-formed HTML:

   http://www.w3.org/TR/xslt#section-HTML-Output-Method
   http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

The usual reason for avoiding the entity is that older browsers and http
agents mishandle it.  However, doing that generates HTML which is not
well-formed, as you've discovered.

As an aside, xalan:use-url-escaping is related to this:

   http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1

Xalan provides this option for the same reason it doesn't use &amp; in URI
query strings -- there are lots of older agents out there that don't
understand URIs encoded this way.

Dave



                                                                                                                                        
                      "Dan Jacobs"                                                                                                      
                      <djacobs@modelob         To:      <xa...@xml.apache.org>                                                  
                      jects.com>               cc:      (bcc: David N Bertoni/Cambridge/IBM)                                            
                                               Subject: RE: HTML output method and &amp; in URLs                                        
                      01/28/2003 11:34                                                                                                  
                      AM                                                                                                                
                                                                                                                                        



Hi Angus,

(We met at the Boston ACM WebTech Group a few years ago, and your name
just came up again last week in a conversation with John Kellerman.)

As far as I can tell, when you extract an entity-encoded String from
your XML document, the entities are translated, and then you have your
String.  If you then include that String in the generated output, you
have to re-entity-encode it yourself.

If you'd rather do things with Java and keep a bit more of your sanity,
you might want to try JPlates instead (http://www.jplates.com).  I'd
love to get your opinion of it in any case.

All the best,
-- Dan Jacobs
-- Chairman, Boston ACM WebTech Group
-- President, JPlates Inc.

> -----Original Message-----
> From: Angus McIntyre [mailto:angus@pobox.com]
> Sent: Tuesday, January 28, 2003 2:20 PM
> To: xalan-j-users@xml.apache.org
> Subject: HTML output method and &amp; in URLs
>
>
> I have a stylesheet processor based on Xalan and Ant which I'm using
> to generate HTML pages from XML. Within my pages, I have some URL
> strings containing arguments, separated by '&'. In the input
> document, the form is:
>
>            arg1=foo&amp;arg2=bar&amp;arg3=baz
>
> The final HTML output contains the string
>
>            arg1=foo&arg2=bar&arg3=baz
>
> which fails validation as HTML, because it uses '&' rather
> than '&amp;'.
>
> My stylesheet defines the output method as:
>
>    <xsl:output method="html"
>      doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
>
> doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224
> /loose.dtd"
>      xalan:omit-meta-tag="yes"/>
>
> If I change the method to 'xml', the '&amp;' entities are not
> converted, so it's presumably the HTML conversion process that is
> doing this. Setting:
>
>            xalan:use-url-escaping="no"
>
> doesn't seem to fix the problem.
>
> Is there any way around this, or am I going to have to hack my
> processor to reencode the '&' characters as entities?
>
> Thanks
>
>            Angus
> --
> angus@pobox.com                             http://pobox.com/~angus
>




RE: HTML output method and & in URLs

Posted by Dan Jacobs <dj...@modelobjects.com>.
Hi Angus,

(We met at the Boston ACM WebTech Group a few years ago, and your name
just came up again last week in a conversation with John Kellerman.)

As far as I can tell, when you extract an entity-encoded String from
your XML document, the entities are translated, and then you have your
String.  If you then include that String in the generated output, you
have to re-entity-encode it yourself.

If you'd rather do things with Java and keep a bit more of your sanity,
you might want to try JPlates instead (http://www.jplates.com).  I'd
love to get your opinion of it in any case.

All the best,
-- Dan Jacobs
-- Chairman, Boston ACM WebTech Group
-- President, JPlates Inc.

> -----Original Message-----
> From: Angus McIntyre [mailto:angus@pobox.com] 
> Sent: Tuesday, January 28, 2003 2:20 PM
> To: xalan-j-users@xml.apache.org
> Subject: HTML output method and &amp; in URLs
> 
> 
> I have a stylesheet processor based on Xalan and Ant which I'm using 
> to generate HTML pages from XML. Within my pages, I have some URL 
> strings containing arguments, separated by '&'. In the input 
> document, the form is:
> 
> 	arg1=foo&amp;arg2=bar&amp;arg3=baz
> 
> The final HTML output contains the string
> 
> 	arg1=foo&arg2=bar&arg3=baz
> 
> which fails validation as HTML, because it uses '&' rather 
> than '&amp;'.
> 
> My stylesheet defines the output method as:
> 
>    <xsl:output method="html"
>      doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
>      
> doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224
> /loose.dtd"
>      xalan:omit-meta-tag="yes"/>
> 
> If I change the method to 'xml', the '&amp;' entities are not 
> converted, so it's presumably the HTML conversion process that is 
> doing this. Setting:
> 
> 	xalan:use-url-escaping="no"
> 
> doesn't seem to fix the problem.
> 
> Is there any way around this, or am I going to have to hack my 
> processor to reencode the '&' characters as entities?
> 
> Thanks
> 
> 	Angus
> -- 
> angus@pobox.com                             http://pobox.com/~angus
>