You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by "HERRICK, CHUCK (SBCSI)" <CH...@momail.sbc.com> on 2000/03/08 19:57:50 UTC

Xerces 1.0.2 DOMParser, whitespace and #text elements

Xerces-J 1.0.2 on NT in VisualAge for Java
 
If you send setIncludeIgnorableWhitespace(false) to
DOMParser, and then parse XML that has basically
ELEMENT_NODEs, and a bit of whitespace between
the start tags and end tags of the ELEMENT_NODEs,
you get #text nodes that contain the white space
(new line, tab, spaces, etc).
 
What's up with that?

Re: Xerces 1.0.2 DOMParser, whitespace and #text elements

Posted by Andy Clark <an...@apache.org>.

"HERRICK, CHUCK (SBCSI)" wrote:
> you get #text nodes that contain the white space
> (new line, tab, spaces, etc).
> 
> What's up with that?

This is all clearly documented on the features page. For the 
parser to know that the whitespace can be ignored, there *must* 
be a DTD associated with the document. So my number one guess 
would be that your document does not have a DOCTYPE line. Is
this true?

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Xerces 1.0.2 DOMParser, whitespace and #text elements

Posted by Calvin Gaisford <ca...@calderasystems.com>.

I have seen the same thing but I just found this in the docs:

This method is used to report all the whitespace characters,
which are determined to be 'ignorable'. This distinction
between characters is only made, if validation is enabled.

If I understand that correctly, ignoring whitespace only
works if you have validation turned on.  right?

"HERRICK, CHUCK (SBCSI)" wrote:

> Xerces-J 1.0.2 on NT in VisualAge for Java
>
> If you send setIncludeIgnorableWhitespace(false) to
> DOMParser, and then parse XML that has basically
> ELEMENT_NODEs, and a bit of whitespace between
> the start tags and end tags of the ELEMENT_NODEs,
> you get #text nodes that contain the white space
> (new line, tab, spaces, etc).
>
> What's up with that?
>