You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Son To <so...@gateway.homeip.net> on 2001/01/27 00:14:15 UTC

Re: Node.getFirstChild() wierdness (fwd)

---------- Forwarded message ----------
Date: 26 Jan 2001 15:08:38 -0800
From: rols@rols.org
To: Son.To.wh98@wharton.upenn.edu
Subject: Re: Node.getFirstChild() wierdness

On Fri, 26 January 2001, Son To wrote:

nope - it's right - the text is the blank line between <address> and <street>

XML requires that the parser pass through all the text in the document whether it's 'ignorable' or not. 

If you have a DTD then the DOM parser can tell what text is ignorable and you can request that it not give it to you by setting 

parser.setIncludeIgnorableWhitespace(false)

if you don't have a DTD then it can't tell what's ignorable and hence it can't leave it out. 

Roland
> 
> Hi,
> I am using DOM. There seems to be an extra Node in the Document object
> that is returned
> 
> for example, 
> 
> <address>
>     <street>3134 Broad Street</street>
>     <city>Philadelphia</city>
>     <state>Pennsylvania</state>
>     <zip>19143</zip>
> </address>
> 
> document.getDocumentElement().getFirstChild().getNodeName() 
> returns "#text" when I expected it to return "street"
> 
> root.getDocumentElement().getFirstChild().getNextSibling().getNodeName()
> returns "street"
> 
> what is that "#text" being returned? Shouldn't the firstChild of address 
> be street?
> 
> I tried various other xml documents, and "#text" is always the firstChild.
> Am I misunderstanding the DOM tree? If so how is this address represented
> in the DOM tree?
> 
> thanks,
> son
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: DOM-whitespace not ignored

Posted by Sebastien Ponce <se...@cern.ch>.

I just tried your program and it works just fine. The only thing you didn't catch is that every text contained by the xml file is
put in the tree via a text file attached to the element concerned. that's why your name node has no value, the value is contained in
a text subnode.

Just look at the structure of your tree :
[#document: null]
  Children :
    [address-book: null]
    [address-book: null]
      Children :
        [entry: null]
          Children :
            [name: null]
              Children :
                [#text: Jo Blo]
            [address: null]
              Children :
                [street: null]
                  Children :
                    [#text: 3134 Broad Street]

and so on.

Here is the routine that displays a tree if you want ot use it :

  private static void dump(Node node, int depth) {
    // this should be useless but it seems that some attributes are created
    // by the documents with null children... To be investigated, there should
    // be a bug in xerces
    if (node == null) {
      return;
    }
    // indent
    for (int i = 0; i < depth; i++) System.out.print("    ");
    if (node instanceof Attr) {
      System.out.println(node.toString() + " owned by " + ((Attr)node).getOwnerElement());
    } else {
      System.out.println(node.toString());
    }
    NamedNodeMap attributes = node.getAttributes();
    if (attributes != null) {
      if (attributes.getLength() > 0) {
        for (int i = 0; i < depth; i++) System.out.print("    ");
        System.out.println("  Attributes :");
      }
      for (int i = 0; i < attributes.getLength(); i++) dump(attributes.item(i), depth+1);
    }
    NodeList children = node.getChildNodes();
    if (children != null) {
      if (children.getLength() >0 ) {
        for (int i = 0; i < depth; i++) System.out.print("    ");
        System.out.println("  Children :");
      }
      for (int i = 0; i < children.getLength(); i++) dump(children.item(i), depth+1);
    }
  }

Sebastien


Son To wrote:

> OK I've tried what you suggested
>
> I included a DTD and did
>  parser.setIncludeIgnorableWhitespace(false);
>
> but it still doesnt ignore all whitespaces (maybe I'm writing the DTD
> incorrectly?)
>
> here is the xml doc:
> ----
> <?xml version="1.0"?>
> <!DOCTYPE address-book SYSTEM "address-book.dtd">
>
> <address-book>
>    <entry>
>       <name>Jo Blo</name>
>       <address>
>          <street>3134 Broad Street</street>
>          <city>Philadelphia</city>
>          <state>Pennsylvania</state>
>          <zip>19143</zip>
>       </address>
>
>       <tel>215-329-3134</tel>
>       <fax>215-333-1234</fax>
>       <email>joblo@hotmail.com</email>
>    </entry>
> </address-book>
>
> here is the DTD
> -----
> <!ELEMENT address-book (entry+)>
>
> <!ELEMENT entry (name, address, tel*, fax*, email*)>
>
> <!ELEMENT name (#PCDATA)>
>
> <!ELEMENT address (street, city, state, zip)>
> <!ELEMENT street (#PCDATA)>
> <!ELEMENT city   (#PCDATA)>
> <!ELEMENT state  (#PCDATA)>
> <!ELEMENT zip    (#PCDATA)>
>
> <!ELEMENT tel (#PCDATA)>
>
> <!ELEMENT fax (#PCDATA)>
>
> <!ELEMENT email (#PCDATA)>
>
>         Node addressBook = root.getDocumentElement();
>         Node entry = addressBook.getFirstChild();
>         Node name = entry.getFirstChild();
>
> name.getNodeValue() returns null
> name.getFirstChild().getNodeValue() returns "Jo Blo"
>
> if you want to reproduce my results, code, xml, and dtd are at
> http://gateway.homeip.net/~son/addressBook
>
> thanks for any feedback,
> son

DOM-whitespace not ignored

Posted by Son To <so...@gateway.homeip.net>.

OK I've tried what you suggested

I included a DTD and did
 parser.setIncludeIgnorableWhitespace(false);

but it still doesnt ignore all whitespaces (maybe I'm writing the DTD
incorrectly?)

here is the xml doc:
----
<?xml version="1.0"?> 
<!DOCTYPE address-book SYSTEM "address-book.dtd">

<address-book>
   <entry>
      <name>Jo Blo</name>
      <address>
         <street>3134 Broad Street</street>
         <city>Philadelphia</city>
         <state>Pennsylvania</state>
         <zip>19143</zip>
      </address>

      <tel>215-329-3134</tel>
      <fax>215-333-1234</fax>
      <email>joblo@hotmail.com</email>
   </entry>
</address-book>

here is the DTD
-----
<!ELEMENT address-book (entry+)>

<!ELEMENT entry (name, address, tel*, fax*, email*)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT address (street, city, state, zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city   (#PCDATA)>
<!ELEMENT state  (#PCDATA)>
<!ELEMENT zip    (#PCDATA)>

<!ELEMENT tel (#PCDATA)>

<!ELEMENT fax (#PCDATA)>

<!ELEMENT email (#PCDATA)>


	Node addressBook = root.getDocumentElement();
	Node entry = addressBook.getFirstChild();
        Node name = entry.getFirstChild();

name.getNodeValue() returns null
name.getFirstChild().getNodeValue() returns "Jo Blo"

if you want to reproduce my results, code, xml, and dtd are at 
http://gateway.homeip.net/~son/addressBook

thanks for any feedback,
son

On Fri, 26 Jan 2001, Son To wrote:
> ---------- Forwarded message ----------
> Date: 26 Jan 2001 15:08:38 -0800
> From: rols@rols.org
> To: Son.To.wh98@wharton.upenn.edu
> Subject: Re: Node.getFirstChild() wierdness
> 
> On Fri, 26 January 2001, Son To wrote:
> 
> nope - it's right - the text is the blank line between <address> and <street>
> 
> XML requires that the parser pass through all the text in the document whether it's 'ignorable' or not. 
> 
> If you have a DTD then the DOM parser can tell what text is ignorable and you can request that it not give it to you by setting 
> 
> parser.setIncludeIgnorableWhitespace(false)
> 
> if you don't have a DTD then it can't tell what's ignorable and hence it can't leave it out. 
> 
> Roland
> > 
> > Hi,
> > I am using DOM. There seems to be an extra Node in the Document object
> > that is returned
> > 
> > for example, 
> > 
> > <address>
> >     <street>3134 Broad Street</street>
> >     <city>Philadelphia</city>
> >     <state>Pennsylvania</state>
> >     <zip>19143</zip>
> > </address>
> > 
> > document.getDocumentElement().getFirstChild().getNodeName() 
> > returns "#text" when I expected it to return "street"
> > 
> > root.getDocumentElement().getFirstChild().getNextSibling().getNodeName()
> > returns "street"
> > 
> > what is that "#text" being returned? Shouldn't the firstChild of address 
> > be street?
> > 
> > I tried various other xml documents, and "#text" is always the firstChild.
> > Am I misunderstanding the DOM tree? If so how is this address represented
> > in the DOM tree?
> > 
> > thanks,
> > son
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>