You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Dmitry Beransky <db...@dembel.org> on 2002/05/12 02:37:13 UTC

strange comments after a transformation

Hi,

I'm using Xalan2 to convert a XHTML DOM into text using the following code:

       Properties props = new java.util.Properties();
       props.put( OutputKeys.OMIT_XML_DECLARATION, "yes" );
       props.put( OutputKeys.INDENT, "yes" );
       props.put( OutputKeys.METHOD, "html" );

       TransformerFactory tFactory = TransformerFactory.newInstance();
       Transformer transformer = tFactory.newTransformer();
       transformer.setOutputProperties( props );

       StreamResult result = new StreamResult(out);
       transformer.transform(source, result);


The DOM is generated from an existing XHTML document that might look like this:

<?xml version="1.0"?>
<!DOCTYPE html
      PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
      "DTD/xhtml1-transitional.dtd">
<html>
    <head>
        <title>title</title>
    </head>
    <body>
       This is a test
    </body>
</html>

The problem is that after the transformation is run the output starts with 
800 lines of comments that look like they are coming from the XHTML DTD 
(see below).

Here are my questions:

1. Why is this happening, am I missing something in how I set up the 
transformation?
2.  How do I stop this?  I've found out that if I remove DOCTYPE 
declaration from the source document, the comments don't show up.  But I 
don't want to fix the problem by getting rid of the DOCTYPE for various 
reasons.  Among the reasons are: 1) I need the doctype in order to use 
getElementByID and to keep the XSLT engine from barfing on XHTML 
entities;2) I may not always have control over the source document.

Thanks
Dmitry

a sample of comments that appear in the output:

<!--
    Extensible HTML version 1.0 Transitional DTD

    This is the same as HTML 4.0 Transitional except for
    changes due to the differences between XML and SGML.

    Namespace = http://www.w3.org/1999/xhtml

    For further information, see: http://www.w3.org/TR/xhtml1

    Copyright (c) 1998-2000 W3C (MIT, INRIA, Keio),
    All Rights Reserved.

    This DTD module is identified by the PUBLIC and SYSTEM identifiers:

    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

    $Revision: 1.14 $
    $Date: 2000/01/25 23:52:20 $

-->
<!--================ Character mnemonic entities =========================-->
<!-- Portions (C) International Organization for Standardization 1986
      Permission to copy in any form is granted for use with
      conforming SGML systems and applications as defined in
      ISO 8879, provided this notice is included in all copies.
-->
<!-- Character entity set. Typical invocation:
     <!ENTITY % HTMLlat1 PUBLIC
        "-//W3C//ENTITIES Latin 1 for XHTML//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
     %HTMLlat1;
-->
<!-- no-break space = non-breaking space,
                                   U+00A0 ISOnum -->
<!-- inverted exclamation mark, U+00A1 ISOnum -->
<!-- cent sign, U+00A2 ISOnum -->
<!-- pound sign, U+00A3 ISOnum -->
<!-- currency sign, U+00A4 ISOnum -->
<!-- yen sign = yuan sign, U+00A5 ISOnum -->
<!-- broken bar = broken vertical bar,
                                   U+00A6 ISOnum -->
<!-- section sign, U+00A7 ISOnum -->
<!-- diaeresis = spacing diaeresis,
                                   U+00A8 ISOdia -->
<!-- copyright sign, U+00A9 ISOnum -->
<!-- feminine ordinal indicator, U+00AA ISOnum -->
<!-- left-pointing double angle quotation mark
                                   = left pointing guillemet, U+00AB ISOnum -->
<!-- not sign = discretionary hyphen,
                                   U+00AC ISOnum -->
<!-- soft hyphen = discretionary hyphen,
                                   U+00AD ISOnum -->
<!-- registered sign = registered trade mark sign,
                                   U+00AE ISOnum -->
<!-- macron = spacing macron = overline
                                   = APL overbar, U+00AF ISOdia -->
<!-- degree sign, U+00B0 ISOnum -->
<!-- plus-minus sign = plus-or-minus sign,
                                   U+00B1 ISOnum -->
<!-- superscript two = superscript digit two
                                   = squared, U+00B2 ISOnum -->
<!-- superscript three = superscript digit three
                                   = cubed, U+00B3 ISOnum -->
<!-- acute accent = spacing acute,
                                   U+00B4 ISOdia -->
<!-- micro sign, U+00B5 ISOnum -->
...