You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Angus McIntyre <an...@pobox.com> on 2003/01/28 20:19:59 UTC
HTML output method and & in URLs
I have a stylesheet processor based on Xalan and Ant which I'm using
to generate HTML pages from XML. Within my pages, I have some URL
strings containing arguments, separated by '&'. In the input
document, the form is:
arg1=foo&arg2=bar&arg3=baz
The final HTML output contains the string
arg1=foo&arg2=bar&arg3=baz
which fails validation as HTML, because it uses '&' rather than '&'.
My stylesheet defines the output method as:
<xsl:output method="html"
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"
xalan:omit-meta-tag="yes"/>
If I change the method to 'xml', the '&' entities are not
converted, so it's presumably the HTML conversion process that is
doing this. Setting:
xalan:use-url-escaping="no"
doesn't seem to fix the problem.
Is there any way around this, or am I going to have to hack my
processor to reencode the '&' characters as entities?
Thanks
Angus
--
angus@pobox.com http://pobox.com/~angus
Re: HTML output method and & in URLs
Posted by Herr Christian Wolfgang Hujer <Ch...@itcqis.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Dan,
Am Dienstag, 28. Januar 2003 20:34 schrieb Dan Jacobs:
> Hi Angus,
> As far as I can tell, when you extract an entity-encoded String from
> your XML document, the entities are translated, and then you have your
> String. If you then include that String in the generated output, you
> have to re-entity-encode it yourself.
No, that's not required, regardless wether output method is xml or html. The
output must be well-formed SGML / HTML and if the transformation would
basically generate valid HTML the output must also be valid. Not encoding &
as & is a violation of the XSLT and the HTML specs.
See XSLT section 16.2 HTML Output Method and HTML 4 section B.2.1 / B.2.2.
Bye
- --
ITCQIS GmbH
Christian Wolfgang Hujer
Geschäftsführender Gesellschafter
Telefon: +49 (0)89 27 37 04 37
Telefax: +49 (0)89 27 37 04 39
E-Mail: Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQE+NuWqzu6h7O/MKZkRAk1eAJ4z9vbJE2D45aQNzME3IUwOGOaMuwCaAoPf
+wJgff5EopbWh4cNyBuaY1Y=
=RufC
-----END PGP SIGNATURE-----
RE: HTML output method and & in URLs
Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.
Hi Angus,
Fascinating. An incorrect answer which implies that using XSLT reduces a
user's sanity and flogs software all in the same message.
This is simply a bug in Xalan-J. The processor must serialize attributes
so the result is well-formed HTML:
http://www.w3.org/TR/xslt#section-HTML-Output-Method
http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
The usual reason for avoiding the entity is that older browsers and http
agents mishandle it. However, doing that generates HTML which is not
well-formed, as you've discovered.
As an aside, xalan:use-url-escaping is related to this:
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1
Xalan provides this option for the same reason it doesn't use & in URI
query strings -- there are lots of older agents out there that don't
understand URIs encoded this way.
Dave
"Dan Jacobs"
<djacobs@modelob To: <xa...@xml.apache.org>
jects.com> cc: (bcc: David N Bertoni/Cambridge/IBM)
Subject: RE: HTML output method and & in URLs
01/28/2003 11:34
AM
Hi Angus,
(We met at the Boston ACM WebTech Group a few years ago, and your name
just came up again last week in a conversation with John Kellerman.)
As far as I can tell, when you extract an entity-encoded String from
your XML document, the entities are translated, and then you have your
String. If you then include that String in the generated output, you
have to re-entity-encode it yourself.
If you'd rather do things with Java and keep a bit more of your sanity,
you might want to try JPlates instead (http://www.jplates.com). I'd
love to get your opinion of it in any case.
All the best,
-- Dan Jacobs
-- Chairman, Boston ACM WebTech Group
-- President, JPlates Inc.
> -----Original Message-----
> From: Angus McIntyre [mailto:angus@pobox.com]
> Sent: Tuesday, January 28, 2003 2:20 PM
> To: xalan-j-users@xml.apache.org
> Subject: HTML output method and & in URLs
>
>
> I have a stylesheet processor based on Xalan and Ant which I'm using
> to generate HTML pages from XML. Within my pages, I have some URL
> strings containing arguments, separated by '&'. In the input
> document, the form is:
>
> arg1=foo&arg2=bar&arg3=baz
>
> The final HTML output contains the string
>
> arg1=foo&arg2=bar&arg3=baz
>
> which fails validation as HTML, because it uses '&' rather
> than '&'.
>
> My stylesheet defines the output method as:
>
> <xsl:output method="html"
> doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
>
> doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224
> /loose.dtd"
> xalan:omit-meta-tag="yes"/>
>
> If I change the method to 'xml', the '&' entities are not
> converted, so it's presumably the HTML conversion process that is
> doing this. Setting:
>
> xalan:use-url-escaping="no"
>
> doesn't seem to fix the problem.
>
> Is there any way around this, or am I going to have to hack my
> processor to reencode the '&' characters as entities?
>
> Thanks
>
> Angus
> --
> angus@pobox.com http://pobox.com/~angus
>
RE: HTML output method and & in URLs
Posted by Dan Jacobs <dj...@modelobjects.com>.
Hi Angus,
(We met at the Boston ACM WebTech Group a few years ago, and your name
just came up again last week in a conversation with John Kellerman.)
As far as I can tell, when you extract an entity-encoded String from
your XML document, the entities are translated, and then you have your
String. If you then include that String in the generated output, you
have to re-entity-encode it yourself.
If you'd rather do things with Java and keep a bit more of your sanity,
you might want to try JPlates instead (http://www.jplates.com). I'd
love to get your opinion of it in any case.
All the best,
-- Dan Jacobs
-- Chairman, Boston ACM WebTech Group
-- President, JPlates Inc.
> -----Original Message-----
> From: Angus McIntyre [mailto:angus@pobox.com]
> Sent: Tuesday, January 28, 2003 2:20 PM
> To: xalan-j-users@xml.apache.org
> Subject: HTML output method and & in URLs
>
>
> I have a stylesheet processor based on Xalan and Ant which I'm using
> to generate HTML pages from XML. Within my pages, I have some URL
> strings containing arguments, separated by '&'. In the input
> document, the form is:
>
> arg1=foo&arg2=bar&arg3=baz
>
> The final HTML output contains the string
>
> arg1=foo&arg2=bar&arg3=baz
>
> which fails validation as HTML, because it uses '&' rather
> than '&'.
>
> My stylesheet defines the output method as:
>
> <xsl:output method="html"
> doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
>
> doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224
> /loose.dtd"
> xalan:omit-meta-tag="yes"/>
>
> If I change the method to 'xml', the '&' entities are not
> converted, so it's presumably the HTML conversion process that is
> doing this. Setting:
>
> xalan:use-url-escaping="no"
>
> doesn't seem to fix the problem.
>
> Is there any way around this, or am I going to have to hack my
> processor to reencode the '&' characters as entities?
>
> Thanks
>
> Angus
> --
> angus@pobox.com http://pobox.com/~angus
>