You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by Cheun N Chong <cn...@ecs.soton.ac.uk> on 2000/05/12 13:49:12 UTC

HELP: How to get the value of the node

Hi all,

	Perhaps I have really repeated my question several times and I am
terribly sorry about that. I am quite frustrated here. I have the
following codes:

=============== THE XML CODES =======================
<BOUGHTSTUFF>

<STUFF>
      <TYPE>milk</TYPE>
      <EXPIRE>6 may 2000</EXPIRE>
</STUFF>

<STUFF>
      <TYPE>pork chop</TYPE>
      <EXPIRE>7 may 2001</EXPIRE>
</STUFF>

</BOUGHTSTUFF>      


=============== THE JAVA CODES ======================
      fileinput = new FileInputStream("stuff.xml");
      xmlinput = new InputSource(fileinput);

      // Use a DOMParser from Xerces so we get a complete DOM from the
document
      DOMParser parser = new DOMParser();
      parser.parse(xmlinput);

      // Get the document and the node list of the required tag name
      Document doc = parser.getDocument();
      NodeList nl = doc.getElementsByTagName("Expire");
      out.println(nl.getLength());
      Node n = nl.item(0);
      String value = nl.item(0).getNodeValue();
      out.println(value);
======================================================

	I want to get the node value of Expire for milk. However when I
tried that code, it print out "null" but when I change the
getNodeValue() to getNodeName(), it can print out the "Expire". Is there
any wrong codes. I really need your expertise.

	Thousand thanks. Wish you all the best.

Best regards,
Cheun Ngen CHONG

Re: How to get the value of the node

Posted by Sean Kelly <ke...@mail2a.jpl.nasa.gov>.

The reason is this: the DOM tree has a few more nodes in it than you're
expecting (I'm leaving out ignorable text nodes):

Document (root)
...ProcessingInstruction "<?xml version="1.0"?>"
...Element "BOUGHTSTUFF"
......Element "STUFF"
.........Element "TYPE"
............Text "milk"
.........Element "EXPIRE"
............Text "6 may 2000"

So, when you get the EXPIRE element's value, you get null.

But if you get the EXPIRE element's first child's value, you get "6 may
2000".

But be warned: there can be multiple Text child nodes under an Element node,
particularly if there are entities in the text.  For example, this XML
document:

<?xml version="1.0"?>
<alpha>
  <beta>I like both vanilla &amp; chocolate.</beta>
</alpha>

would produce this DOM tree:

Document (root)
...ProcessingInstruction "<?xml version="1.0"?>"
...Element "alpha"
......Element "beta"
.........Text "I like both vanilla "
.........Text "&"
.........Text " chocolate."

Again, I'm leaving out ignorable whitespace nodes for clarity.

You can get around these multiple child Text nodes by either using a routine
like the following:

        /** Get the text out of the given node.
         *
         * Algorithm taken from <cite>XML and Java</cite> by Maruyama,
Tamura, and
         * Uramoto, Addison-Wesley 1999.
         *
         * @param node The node whose children contain text.
         * @return The text.
         */
        private static String text(Node node) {
                // [ return text(node) ]
                StringBuffer buffer = new StringBuffer();
                return text1(node, buffer);
        }

        /** Get the text out of a given node and into the given buffer.
         *
         * Algorithm taken from <cite>XML and Java</cite> by Maruyama,
Tamura, and
         * Uramoto, Addison-Wesley 1999.
         *
         * @param node The node.
         * @param buffer The buffer.
         * @return The text.
         */
        private static String text1(Node node, StringBuffer buffer) {
                for (Node ch = node.getFirstChild(); ch != null; ch =
ch.getNextSibling()) {
                        if (ch.getNodeType() == Node.ELEMENT_NODE ||
ch.getNodeType() == Node.ENTITY_REFERENCE_NODE)
                                buffer.append(text(ch));
                        else if (ch.getNodeType() == Node.TEXT_NODE)
                                buffer.append(ch.getNodeValue());
                }
                return buffer.toString();
        }

Another thing you can do is normalize the document tree by calling the
Document.normalize() method.  That will also get rid of ignorable white
space nodes, too, if your document is using a DTD that would indicate to the
parser which white space nodes it can get rid of.

It'd help a lot if you wrote a little recursive debugging utility to print
out your DOM tree.  It doesn't have to be fancy or anything, but it would
help you understand exactly what your tree looks like.

--Sean