You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2004/03/22 16:18:32 UTC
DO NOT REPLY [Bug 27844] -
XML parsing skips text
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27844>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27844
XML parsing skips text
mrglavas@ca.ibm.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID
------- Additional Comments From mrglavas@ca.ibm.com 2004-03-22 15:18 -------
The JavaDoc of ContentHandler#characters states:
"The Parser will call this method to report each chunk of character data. SAX
parsers may return all contiguous character data in a single chunk, or they may
split it into several chunks; however, all of the characters in any single
event must come from the same external entity so that the Locator provides
useful information."
SAX parsers are free to split character data into as any much chunks as
they please and they can split the text at whatever boundaries they want. This
is allowed for reasons having to do with parser efficiency and input buffering.
In order to handle this properly, your handler needs to accumulate the text
returned in each call until you recieve a callback that isn't characters.
Xerces will split calls to characters at the end of an internal buffer, at a
new line and also at a few other boundaries. You can never rely on contiguous
text to be passed in a single characters callback.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org