You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Ijonas Kisselbach <ij...@vamosa.com> on 2004/07/16 11:50:59 UTC

WHY? external XHTML DOCTYPE included in transformed/serialized document

Hi,

 

I've got an xhtml document with a DOCTYPE and valid namespace defined:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta name="generator" content="HTML Tidy for Windows (vers 1st June
2004), see www.w3.org" />

<title>PRUEBA Home Page</title>

...

....

</html>

 

When I parse this document into a w3c DOM and then immediately
transform/serialize the document back to String representation I get the
following output:

 

<?xml version="1.0" encoding="UTF-8"?>

<!--

   Extensible HTML version 1.0 Transitional DTD

 

   This is the same as HTML 4 Transitional except for

   changes due to the differences between XML and SGML.

 

   Namespace = http://www.w3.org/1999/xhtml

 

   For further information, see: http://www.w3.org/TR/xhtml1

 

   Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),

   All Rights Reserved.

 

   This DTD module is identified by the PUBLIC and SYSTEM identifiers:

 

   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

   SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

 

   $Revision: 1.2 $

   $Date: 2002/08/01 18:37:55 $

 

--><!--================ Character mnemonic entities
=========================--><!-- Portions (C) International Organization
for Standardization 1986

.....

.....

    between groups of table rows.

--><!-- Scope is simpler than headers attribute for common tables
--><!-- th is for headers, td for data and for cells acting as both -->

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">

<meta content="HTML Tidy for Windows (vers 1st June 2004), see
www.w3.org" name="generator">

....

</html>

 

It seems Xalan 2.6.0 is downloading the external DTD and rendering it
in-place during the transformation/serialization. Is there anyway to
turn this feature off ? I've tried setting the DocumentBuilderFactory to
"non-validating" but that had no effect. I've tried Java 1.4.2.03 and
Java 1.4.2.05 and both behave identical.

 

Here are the code fragments used:

 

// to read original document from ByteArrayInputStream

          DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();

          dbf.setExpandEntityReferences(false);

          dbf.setValidating(false);

          DocumentBuilder db = dbf.newDocumentBuilder();

          Document doc = db.parse(tbais);

 

// to transform to String which includes in-place DTD

      StringWriter sw = new StringWriter();

      DOMSource domSource = new DOMSource(doc);

      TransformerFactory transformerFactory = getFactory();

      Transformer transformer = transformerFactory.newTransformer();

      transformer.transform(domSource, new StreamResult(sw));

 

Thanks in advance for any help or advice,

Ijonas Kisselbach.


Re: WHY? external XHTML DOCTYPE included in transformed/serialized document

Posted by Joseph Kesselman <ke...@us.ibm.com>.



Which version of the _parser_ are you using? I believe that was a known bug
in one release of Xerces...

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk