You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Tony Thompson <To...@stone-ware.com> on 2007/09/04 17:23:23 UTC

HTML Serialization Issue

I am using Xalan 2.7.0 and I am having an issue when trying to serialize
an HTML document.  Below I have 2 examples of source documents that have
a comment in the style.  For both cases, I have the output method set to
"html".  In the first case, the serialized document is fine.  In the
second case, the comments in the style have been replaced with entity
references.  If I change the output to "xml", I get the same results
with the first source document (it breaks).  The transformer that is
being used is org.apache.xalan.transformer.TransformerIdentityImpl.  Is
there anything I can do to correct this or might it be an issue with the
Xalan transformer?

Thanks.
Tony


Source #1:
 
<html>
<head>
 
<style>
<!--
body { background-color: #FFFF88; }
-->
</style>
 
</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #1:
 
<html>
<head><META http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
 
<style>
<!--
body { background-color: #FFFF88; }
-->
</style>
 
</head>
<body>
<!-- comment -->
Test page.
</body>
</html>



Source #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
<!--
body { background-color: #FFFF88; }
-->
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
&lt;!--
body { background-color: #FFFF88; }
--&gt;
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.

RE: HTML Serialization Issue

Posted by Tony Thompson <To...@stone-ware.com>.
Dave,

I am not using a stylesheet.  I am just serializing a document from a
SAXSource to a StreamResult.  Even if it is supposed to switch to XML
output because of the non-null namespace, why is the comment inside of
the <style></style> tag not allowed?  That should be allowed in XHTML as
well.

Tony

-----Original Message-----
From: David Bertoni [mailto:dbertoni@apache.org] 
Sent: Tuesday, September 04, 2007 1:51 PM
To: 'Xalan J Users List'
Subject: Re: HTML Serialization Issue

Tony Thompson wrote:
> I am using Xalan 2.7.0 and I am having an issue when trying to 
> serialize an HTML document.  Below I have 2 examples of source 
> documents that have a comment in the style.  For both cases, I have 
> the output method set to "html".  In the first case, the serialized 
> document is fine.  In the second case, the comments in the style have 
> been replaced with entity references.  If I change the output to 
> "xml", I get the same results with the first source document (it 
> breaks).  The transformer that is being used is 
> org.apache.xalan.transformer.TransformerIdentityImpl.  Is there 
> anything I can do to correct this or might it be an issue with the
Xalan transformer?
This is likely an XSLT 1.0 issue with trying to generate XHTML.

> 
> Thanks.
> Tony

...

> 
> Source #2:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>

Your document now has elements with a non-null namespace URI.  According
to the XSLT 1.0 recommendation, they must be serialized as XML:

http://www.w3.org/TR/xslt#section-HTML-Output-Method

"The html output method should not output an element differently from
the xml output method unless the expanded-name of the element has a null
namespace URI; an element whose expanded-name has a non-null namespace
URI should be output as XML. If the expanded-name of the element has a
null namespace URI, but the local part of the expanded-name is not
recognized as the name of an HTML element, the element should output in
the same way as a non-empty, inline element such as span."

You haven't shown us your stylesheet, so I'm not sure how you're
generating the style element in the result tree.  Perhaps if you posted
a minimal stylesheet that illustrates the two different behaviors,
someone can confirm the behavior is correct.

Dave
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.

Re: HTML Serialization Issue

Posted by David Bertoni <db...@apache.org>.
Tony Thompson wrote:
> I am using Xalan 2.7.0 and I am having an issue when trying to serialize
> an HTML document.  Below I have 2 examples of source documents that have
> a comment in the style.  For both cases, I have the output method set to
> "html".  In the first case, the serialized document is fine.  In the
> second case, the comments in the style have been replaced with entity
> references.  If I change the output to "xml", I get the same results
> with the first source document (it breaks).  The transformer that is
> being used is org.apache.xalan.transformer.TransformerIdentityImpl.  Is
> there anything I can do to correct this or might it be an issue with the
> Xalan transformer?
This is likely an XSLT 1.0 issue with trying to generate XHTML.

> 
> Thanks.
> Tony

...

> 
> Source #2:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>

Your document now has elements with a non-null namespace URI.  According to 
the XSLT 1.0 recommendation, they must be serialized as XML:

http://www.w3.org/TR/xslt#section-HTML-Output-Method

"The html output method should not output an element differently from the 
xml output method unless the expanded-name of the element has a null 
namespace URI; an element whose expanded-name has a non-null namespace URI 
should be output as XML. If the expanded-name of the element has a null 
namespace URI, but the local part of the expanded-name is not recognized as 
the name of an HTML element, the element should output in the same way as a 
non-empty, inline element such as span."

You haven't shown us your stylesheet, so I'm not sure how you're generating 
the style element in the result tree.  Perhaps if you posted a minimal 
stylesheet that illustrates the two different behaviors, someone can 
confirm the behavior is correct.

Dave

RE: HTML Serialization Issue

Posted by Tony Thompson <To...@stone-ware.com>.
Yes, you are correct.  It is apparently the SAX parser that I am using.
In both cases it is not picking up on the fact that there is a comment
inside the <style> tag so it is sending it on as characters.  When the
transformer switches to XML, it adds the entity references as it should.
If it is not doing XML, it doesn't change anything.  Thus the appearance
that it is working.
 
Sorry to bother you.  Thanks for the help.
Tony

________________________________

From: Erin Harris [mailto:eharris@ca.ibm.com] 
Sent: Wednesday, September 05, 2007 10:28 AM
To: Tony Thompson
Cc: xalan-j-users@xml.apache.org
Subject: Re: HTML Serialization Issue



Hi Tony, 

I couldn't reproduce what you are seeing using the following: 

import javax.xml.transform.*; 
import javax.xml.transform.stream.*; 

public class IdentityTransform 
{ 
    public static void main(String[] args) throws Exception { 
        if (args.length != 1) { 
            System.out.println("USAGE: java IdentityTransform <input
file name>"); 
        } 
        TransformerFactory tFactory = TransformerFactory.newInstance(); 
        Transformer transformer = tFactory.newTransformer(); 
        transformer.setOutputProperty(OutputKeys.METHOD, "html"); 
        transformer.setOutputProperty(OutputKeys.ENCODING,
"ISO-8859-1"); 
        transformer.transform(new StreamSource(args[0]), new
StreamResult(System.out)); 
    } 
} 

Can you send me the java code you are using to run the identity
transformation?  Also can you double check the version of Xalan by
entering 'java org.apache.xalan.Version" on the command line? 

Thanks. 


Erin Harris





"Tony Thompson" <To...@stone-ware.com> 

04/09/2007 11:23 AM 

To
<xa...@xml.apache.org> 
cc
Subject
HTML Serialization Issue

	




I am using Xalan 2.7.0 and I am having an issue when trying to serialize
an HTML document.  Below I have 2 examples of source documents that have
a comment in the style.  For both cases, I have the output method set to
"html".  In the first case, the serialized document is fine.  In the
second case, the comments in the style have been replaced with entity
references.  If I change the output to "xml", I get the same results
with the first source document (it breaks).  The transformer that is
being used is org.apache.xalan.transformer.TransformerIdentityImpl.  Is
there anything I can do to correct this or might it be an issue with the
Xalan transformer?

Thanks.
Tony


Source #1:

<html>
<head>

<style>
<!--body { background-color: #FFFF88; }-->
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #1:

<html>
<head><META http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">

<style>
<!--body { background-color: #FFFF88; }-->
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>



Source #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
<!--body { background-color: #FFFF88; }-->
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
&lt;!--
body { background-color: #FFFF88; }
--&gt;
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>

This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.

Re: HTML Serialization Issue

Posted by Erin Harris <eh...@ca.ibm.com>.
Hi Tony,

I couldn't reproduce what you are seeing using the following:

import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class IdentityTransform
{
    public static void main(String[] args) throws Exception {
        if (args.length != 1) {
            System.out.println("USAGE: java IdentityTransform <input file 
name>");
        }
        TransformerFactory tFactory = TransformerFactory.newInstance();
        Transformer transformer = tFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.METHOD, "html");
        transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
        transformer.transform(new StreamSource(args[0]), new 
StreamResult(System.out));
    }
}

Can you send me the java code you are using to run the identity 
transformation?  Also can you double check the version of Xalan by 
entering 'java org.apache.xalan.Version" on the command line?

Thanks.


Erin Harris





"Tony Thompson" <To...@stone-ware.com> 
04/09/2007 11:23 AM

To
<xa...@xml.apache.org>
cc

Subject
HTML Serialization Issue






I am using Xalan 2.7.0 and I am having an issue when trying to serialize
an HTML document.  Below I have 2 examples of source documents that have
a comment in the style.  For both cases, I have the output method set to
"html".  In the first case, the serialized document is fine.  In the
second case, the comments in the style have been replaced with entity
references.  If I change the output to "xml", I get the same results
with the first source document (it breaks).  The transformer that is
being used is org.apache.xalan.transformer.TransformerIdentityImpl.  Is
there anything I can do to correct this or might it be an issue with the
Xalan transformer?

Thanks.
Tony


Source #1:
 
<html>
<head>
 
<style>
<!--
body { background-color: #FFFF88; }
-->
</style>
 
</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #1:
 
<html>
<head><META http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
 
<style>
<!--
body { background-color: #FFFF88; }
-->
</style>
 
</head>
<body>
<!-- comment -->
Test page.
</body>
</html>



Source #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
<!--
body { background-color: #FFFF88; }
-->
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>


Output #2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<style>
&lt;!--
body { background-color: #FFFF88; }
--&gt;
</style>

</head>
<body>
<!-- comment -->
Test page.
</body>
</html>
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.