You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2001/11/16 15:03:44 UTC

DO NOT REPLY [Bug 4908] - XPath's text() and node() selectors get confused by CDATA sections

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4908>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4908

XPath's text() and node() selectors get confused by CDATA sections





------- Additional Comments From keshlam@us.ibm.com  2001-11-16 06:03 -------
If you can provide a sample that shows the problem you're reporting, that would 
be very helpful in analysing whether this is broken or working as intended.

XPath has no concept of CDATA sections; all contiguous text, whether escaped via 
<![CDATA[]]> or not, is considered a single XPath node. So what should be 
happening is that XPath sees <doc> has having three XPath child nodes: a text 
node, and element, and another text node.

Xalan should be attempting to apply a rule which "coalesces" adjacent text into 
a single DTM node, which is considered ordinary text unless _ALL_ the adjacent 
text was contained in <![CDATA[]]> sections. 

NOTE that if you're viewing the nodes through the extension mechanism's DOM view 
of the data, there's a known limitation in that DOM2DTM associates only the 
first DOM text node with the DTM node. It's the extension programmer's 
responsibility to understand than and to check for immediately following text 
nodes -- including the possibility of crossing Entity Reference boundaries. 
There really isn't a better answer; DOM Level 3's proposed XPath API came to the 
same conclusion, though they're adding a getWholeText convenience method -- much 
like our string() -- to make that common use case more convenient.