You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by "HERRICK, CHUCK (SBCSI)" <CH...@momail.sbc.com> on 2000/03/08 19:57:50 UTC
Xerces 1.0.2 DOMParser, whitespace and #text elements
Xerces-J 1.0.2 on NT in VisualAge for Java
If you send setIncludeIgnorableWhitespace(false) to
DOMParser, and then parse XML that has basically
ELEMENT_NODEs, and a bit of whitespace between
the start tags and end tags of the ELEMENT_NODEs,
you get #text nodes that contain the white space
(new line, tab, spaces, etc).
What's up with that?
Re: Xerces 1.0.2 DOMParser, whitespace and #text elements
Posted by Andy Clark <an...@apache.org>.
"HERRICK, CHUCK (SBCSI)" wrote:
> you get #text nodes that contain the white space
> (new line, tab, spaces, etc).
>
> What's up with that?
This is all clearly documented on the features page. For the
parser to know that the whitespace can be ignored, there *must*
be a DTD associated with the document. So my number one guess
would be that your document does not have a DOCTYPE line. Is
this true?
--
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
Re: Xerces 1.0.2 DOMParser, whitespace and #text elements
Posted by Calvin Gaisford <ca...@calderasystems.com>.
I have seen the same thing but I just found this in the docs:
This method is used to report all the whitespace characters,
which are determined to be 'ignorable'. This distinction
between characters is only made, if validation is enabled.
If I understand that correctly, ignoring whitespace only
works if you have validation turned on. right?
"HERRICK, CHUCK (SBCSI)" wrote:
> Xerces-J 1.0.2 on NT in VisualAge for Java
>
> If you send setIncludeIgnorableWhitespace(false) to
> DOMParser, and then parse XML that has basically
> ELEMENT_NODEs, and a bit of whitespace between
> the start tags and end tags of the ELEMENT_NODEs,
> you get #text nodes that contain the white space
> (new line, tab, spaces, etc).
>
> What's up with that?
>