You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Son To <so...@gateway.homeip.net> on 2001/01/27 00:14:15 UTC
Re: Node.getFirstChild() wierdness (fwd)
---------- Forwarded message ----------
Date: 26 Jan 2001 15:08:38 -0800
From: rols@rols.org
To: Son.To.wh98@wharton.upenn.edu
Subject: Re: Node.getFirstChild() wierdness
On Fri, 26 January 2001, Son To wrote:
nope - it's right - the text is the blank line between <address> and <street>
XML requires that the parser pass through all the text in the document whether it's 'ignorable' or not.
If you have a DTD then the DOM parser can tell what text is ignorable and you can request that it not give it to you by setting
parser.setIncludeIgnorableWhitespace(false)
if you don't have a DTD then it can't tell what's ignorable and hence it can't leave it out.
Roland
>
> Hi,
> I am using DOM. There seems to be an extra Node in the Document object
> that is returned
>
> for example,
>
> <address>
> <street>3134 Broad Street</street>
> <city>Philadelphia</city>
> <state>Pennsylvania</state>
> <zip>19143</zip>
> </address>
>
> document.getDocumentElement().getFirstChild().getNodeName()
> returns "#text" when I expected it to return "street"
>
> root.getDocumentElement().getFirstChild().getNextSibling().getNodeName()
> returns "street"
>
> what is that "#text" being returned? Shouldn't the firstChild of address
> be street?
>
> I tried various other xml documents, and "#text" is always the firstChild.
> Am I misunderstanding the DOM tree? If so how is this address represented
> in the DOM tree?
>
> thanks,
> son
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
Re: DOM-whitespace not ignored
Posted by Sebastien Ponce <se...@cern.ch>.
I just tried your program and it works just fine. The only thing you didn't catch is that every text contained by the xml file is
put in the tree via a text file attached to the element concerned. that's why your name node has no value, the value is contained in
a text subnode.
Just look at the structure of your tree :
[#document: null]
Children :
[address-book: null]
[address-book: null]
Children :
[entry: null]
Children :
[name: null]
Children :
[#text: Jo Blo]
[address: null]
Children :
[street: null]
Children :
[#text: 3134 Broad Street]
and so on.
Here is the routine that displays a tree if you want ot use it :
private static void dump(Node node, int depth) {
// this should be useless but it seems that some attributes are created
// by the documents with null children... To be investigated, there should
// be a bug in xerces
if (node == null) {
return;
}
// indent
for (int i = 0; i < depth; i++) System.out.print(" ");
if (node instanceof Attr) {
System.out.println(node.toString() + " owned by " + ((Attr)node).getOwnerElement());
} else {
System.out.println(node.toString());
}
NamedNodeMap attributes = node.getAttributes();
if (attributes != null) {
if (attributes.getLength() > 0) {
for (int i = 0; i < depth; i++) System.out.print(" ");
System.out.println(" Attributes :");
}
for (int i = 0; i < attributes.getLength(); i++) dump(attributes.item(i), depth+1);
}
NodeList children = node.getChildNodes();
if (children != null) {
if (children.getLength() >0 ) {
for (int i = 0; i < depth; i++) System.out.print(" ");
System.out.println(" Children :");
}
for (int i = 0; i < children.getLength(); i++) dump(children.item(i), depth+1);
}
}
Sebastien
Son To wrote:
> OK I've tried what you suggested
>
> I included a DTD and did
> parser.setIncludeIgnorableWhitespace(false);
>
> but it still doesnt ignore all whitespaces (maybe I'm writing the DTD
> incorrectly?)
>
> here is the xml doc:
> ----
> <?xml version="1.0"?>
> <!DOCTYPE address-book SYSTEM "address-book.dtd">
>
> <address-book>
> <entry>
> <name>Jo Blo</name>
> <address>
> <street>3134 Broad Street</street>
> <city>Philadelphia</city>
> <state>Pennsylvania</state>
> <zip>19143</zip>
> </address>
>
> <tel>215-329-3134</tel>
> <fax>215-333-1234</fax>
> <email>joblo@hotmail.com</email>
> </entry>
> </address-book>
>
> here is the DTD
> -----
> <!ELEMENT address-book (entry+)>
>
> <!ELEMENT entry (name, address, tel*, fax*, email*)>
>
> <!ELEMENT name (#PCDATA)>
>
> <!ELEMENT address (street, city, state, zip)>
> <!ELEMENT street (#PCDATA)>
> <!ELEMENT city (#PCDATA)>
> <!ELEMENT state (#PCDATA)>
> <!ELEMENT zip (#PCDATA)>
>
> <!ELEMENT tel (#PCDATA)>
>
> <!ELEMENT fax (#PCDATA)>
>
> <!ELEMENT email (#PCDATA)>
>
> Node addressBook = root.getDocumentElement();
> Node entry = addressBook.getFirstChild();
> Node name = entry.getFirstChild();
>
> name.getNodeValue() returns null
> name.getFirstChild().getNodeValue() returns "Jo Blo"
>
> if you want to reproduce my results, code, xml, and dtd are at
> http://gateway.homeip.net/~son/addressBook
>
> thanks for any feedback,
> son
DOM-whitespace not ignored
Posted by Son To <so...@gateway.homeip.net>.
OK I've tried what you suggested
I included a DTD and did
parser.setIncludeIgnorableWhitespace(false);
but it still doesnt ignore all whitespaces (maybe I'm writing the DTD
incorrectly?)
here is the xml doc:
----
<?xml version="1.0"?>
<!DOCTYPE address-book SYSTEM "address-book.dtd">
<address-book>
<entry>
<name>Jo Blo</name>
<address>
<street>3134 Broad Street</street>
<city>Philadelphia</city>
<state>Pennsylvania</state>
<zip>19143</zip>
</address>
<tel>215-329-3134</tel>
<fax>215-333-1234</fax>
<email>joblo@hotmail.com</email>
</entry>
</address-book>
here is the DTD
-----
<!ELEMENT address-book (entry+)>
<!ELEMENT entry (name, address, tel*, fax*, email*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (street, city, state, zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT tel (#PCDATA)>
<!ELEMENT fax (#PCDATA)>
<!ELEMENT email (#PCDATA)>
Node addressBook = root.getDocumentElement();
Node entry = addressBook.getFirstChild();
Node name = entry.getFirstChild();
name.getNodeValue() returns null
name.getFirstChild().getNodeValue() returns "Jo Blo"
if you want to reproduce my results, code, xml, and dtd are at
http://gateway.homeip.net/~son/addressBook
thanks for any feedback,
son
On Fri, 26 Jan 2001, Son To wrote:
> ---------- Forwarded message ----------
> Date: 26 Jan 2001 15:08:38 -0800
> From: rols@rols.org
> To: Son.To.wh98@wharton.upenn.edu
> Subject: Re: Node.getFirstChild() wierdness
>
> On Fri, 26 January 2001, Son To wrote:
>
> nope - it's right - the text is the blank line between <address> and <street>
>
> XML requires that the parser pass through all the text in the document whether it's 'ignorable' or not.
>
> If you have a DTD then the DOM parser can tell what text is ignorable and you can request that it not give it to you by setting
>
> parser.setIncludeIgnorableWhitespace(false)
>
> if you don't have a DTD then it can't tell what's ignorable and hence it can't leave it out.
>
> Roland
> >
> > Hi,
> > I am using DOM. There seems to be an extra Node in the Document object
> > that is returned
> >
> > for example,
> >
> > <address>
> > <street>3134 Broad Street</street>
> > <city>Philadelphia</city>
> > <state>Pennsylvania</state>
> > <zip>19143</zip>
> > </address>
> >
> > document.getDocumentElement().getFirstChild().getNodeName()
> > returns "#text" when I expected it to return "street"
> >
> > root.getDocumentElement().getFirstChild().getNextSibling().getNodeName()
> > returns "street"
> >
> > what is that "#text" being returned? Shouldn't the firstChild of address
> > be street?
> >
> > I tried various other xml documents, and "#text" is always the firstChild.
> > Am I misunderstanding the DOM tree? If so how is this address represented
> > in the DOM tree?
> >
> > thanks,
> > son
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>