You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2004/03/22 16:18:32 UTC

DO NOT REPLY [Bug 27844] - XML parsing skips text

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27844>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27844

XML parsing skips text

mrglavas@ca.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From mrglavas@ca.ibm.com  2004-03-22 15:18 -------
The JavaDoc of ContentHandler#characters states:

"The Parser will call this method to report each chunk of character data. SAX 
parsers may return all contiguous character data in a single chunk, or they may 
split it into several chunks; however, all of the characters in any single 
event must come from the same external entity so that the Locator provides 
useful information."

SAX parsers are free to split character data into as any much chunks as 
they please and they can split the text at whatever boundaries they want. This 
is allowed for reasons having to do with parser efficiency and input buffering. 
In order to handle this properly, your handler needs to accumulate the text 
returned in each call until you recieve a callback that isn't characters.

Xerces will split calls to characters at the end of an internal buffer, at a 
new line and also at a few other boundaries. You can never rely on contiguous 
text to be passed in a single characters callback.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org