You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xerces.apache.org by Nathan Wang <NW...@oni.com> on 2000/08/02 04:46:04 UTC

A problem with Xerces DOMParser

Hi,
 
I just want to report a problem and would like to know
if there's a fix.
 
I did something like the following:
    DOMParser domParser = new DOMParser();
    domParser.parse(strUrl);
    Document document = domParser.getDocument();
 
    NodeList nl = document.getElementsByTagName("attr");
    Node node;
    node = nl.item(0).getFirstChild();
    strTitle = node.getNodeValue();

The original/real value for <attr> was
    President & CEO
but, I got only the "President" part back in strTitle.
The '&' was correctly encoded as "&amp;".
 
I could view the XML correctly with IE.
 
I really appreciate if you could give me some input.
 
Thanks,
Nathan
----------------------------------------------
Nathan Q. Wang           ONI Systems, Inc.
----------------------------------------------

Re: A problem with Xerces DOMParser

Posted by Joe Polastre <jp...@apache.org>.

The #CDATA portion of an element is often times broken up into multiple text
nodes that are children of that element.  by only getting the first node,
you're only getting the first text node which is 'President'.  I'm willing
to bet that the second node contains '&' and the third contains 'CEO'.

when getting the text of a node, you should traverse all the children and
get the values of any nodes whose names are #text.

-Joe Polastre  (jpolast@apache.org)
IBM Cupertino, XML Technology Group


----- Original Message -----
From: "Nathan Wang" <NW...@oni.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, August 01, 2000 7:46 PM
Subject: A problem with Xerces DOMParser


> Hi,
>
> I just want to report a problem and would like to know
> if there's a fix.
>
> I did something like the following:
>     DOMParser domParser = new DOMParser();
>     domParser.parse(strUrl);
>     Document document = domParser.getDocument();
>
>     NodeList nl = document.getElementsByTagName("attr");
>     Node node;
>     node = nl.item(0).getFirstChild();
>     strTitle = node.getNodeValue();
>
> The original/real value for <attr> was
>     President & CEO
> but, I got only the "President" part back in strTitle.
> The '&' was correctly encoded as "&amp;".
>
> I could view the XML correctly with IE.
>
> I really appreciate if you could give me some input.
>
> Thanks,
> Nathan
> ----------------------------------------------
> Nathan Q. Wang           ONI Systems, Inc.
> ----------------------------------------------
>