You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Ijonas Kisselbach <ij...@vamosa.com> on 2004/07/16 11:50:59 UTC
WHY? external XHTML DOCTYPE included in transformed/serialized document
Hi,
I've got an xhtml document with a DOCTYPE and valid namespace defined:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Windows (vers 1st June
2004), see www.w3.org" />
<title>PRUEBA Home Page</title>
...
....
</html>
When I parse this document into a w3c DOM and then immediately
transform/serialize the document back to String representation I get the
following output:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Extensible HTML version 1.0 Transitional DTD
This is the same as HTML 4 Transitional except for
changes due to the differences between XML and SGML.
Namespace = http://www.w3.org/1999/xhtml
For further information, see: http://www.w3.org/TR/xhtml1
Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),
All Rights Reserved.
This DTD module is identified by the PUBLIC and SYSTEM identifiers:
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
$Revision: 1.2 $
$Date: 2002/08/01 18:37:55 $
--><!--================ Character mnemonic entities
=========================--><!-- Portions (C) International Organization
for Standardization 1986
.....
.....
between groups of table rows.
--><!-- Scope is simpler than headers attribute for common tables
--><!-- th is for headers, td for data and for cells acting as both -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="HTML Tidy for Windows (vers 1st June 2004), see
www.w3.org" name="generator">
....
</html>
It seems Xalan 2.6.0 is downloading the external DTD and rendering it
in-place during the transformation/serialization. Is there anyway to
turn this feature off ? I've tried setting the DocumentBuilderFactory to
"non-validating" but that had no effect. I've tried Java 1.4.2.03 and
Java 1.4.2.05 and both behave identical.
Here are the code fragments used:
// to read original document from ByteArrayInputStream
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
dbf.setExpandEntityReferences(false);
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(tbais);
// to transform to String which includes in-place DTD
StringWriter sw = new StringWriter();
DOMSource domSource = new DOMSource(doc);
TransformerFactory transformerFactory = getFactory();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(domSource, new StreamResult(sw));
Thanks in advance for any help or advice,
Ijonas Kisselbach.
Re: WHY? external XHTML DOCTYPE included in transformed/serialized document
Posted by Joseph Kesselman <ke...@us.ibm.com>.
Which version of the _parser_ are you using? I believe that was a known bug
in one release of Xerces...
______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk