You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2001/11/16 15:03:44 UTC
DO NOT REPLY [Bug 4908] -
XPath's text() and node() selectors get confused by CDATA sections
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4908>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4908
XPath's text() and node() selectors get confused by CDATA sections
------- Additional Comments From keshlam@us.ibm.com 2001-11-16 06:03 -------
If you can provide a sample that shows the problem you're reporting, that would
be very helpful in analysing whether this is broken or working as intended.
XPath has no concept of CDATA sections; all contiguous text, whether escaped via
<![CDATA[]]> or not, is considered a single XPath node. So what should be
happening is that XPath sees <doc> has having three XPath child nodes: a text
node, and element, and another text node.
Xalan should be attempting to apply a rule which "coalesces" adjacent text into
a single DTM node, which is considered ordinary text unless _ALL_ the adjacent
text was contained in <![CDATA[]]> sections.
NOTE that if you're viewing the nodes through the extension mechanism's DOM view
of the data, there's a known limitation in that DOM2DTM associates only the
first DOM text node with the DTM node. It's the extension programmer's
responsibility to understand than and to check for immediately following text
nodes -- including the possibility of crossing Entity Reference boundaries.
There really isn't a better answer; DOM Level 3's proposed XPath API came to the
same conclusion, though they're adding a getWholeText convenience method -- much
like our string() -- to make that common use case more convenient.