You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by URAMOTO Naohiko <ur...@trl.ibm.co.jp> on 2000/10/24 09:22:00 UTC

Strange DOM Structure for XML document without DTD

Hi,

I faced a problem when I tried to parse an XML document without DOCTYPE decl.

For example, suppose we parse the following document.

<?xml version="1.0"?>
<root><child/></root>

The DOM structure of the result looks like this,

DOCUMENT_NODE document
   +ELEMENT_NODE root
       +ELEMENT_NODE child
   +DOCTYPE_NODE root

I cannot understand why the DOCUMENT_TYPE_NODE object is located as the second
child of the DOCUMENT node.

When I parse the XML document with DTD:

<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (child)>
<!ELEMENT child EMPTY>
]>
<root><child/></root>

In this case, the result DOM structure is what I expect.

DOCUMENT_NODE document
   +DOCTYPE_NODE root
   +ELEMENT_NODE root
       +ELEMENT_NODE child

This problem happens by using most Xerces including 1.2.1 (XML4JV2 and
Xerces Version 1.0.4 outputs appropriate DOM structure). Is it normal 
processing
of Xerces?

Best regards,
Naohiko





Naohiko URAMOTO (浦本 直彦)
uramoto@jp.ibm.com
Internet & Language Technology, Tokyo Research Laboratory, IBM Research