You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Meh-Lit Kim <me...@yahoo.com> on 2003/10/03 18:41:16 UTC
QUESTION: char content chunking in ContentHandler.characters()
Hi,
Is there any guarantee that the org.xml.sax.ContentHandler.characters() callback
will not break a whitespace-separated 'word' into different chunks ?
e.g., given the following XML fragment :
<NumberList>
111 222 333
444 555 666
</NumberList>
The possible values for the string corresponding to the input param
'ch[start] ... ch[start+length-1]' in the callback method
org.xml.sax.ContentHandler( char[] ch, int start, int length )
may be something like :
"111 222 333"
"444 555 666"
but will NEVER be something like :
"111 22"
"2 333"
"444 5"
"55 666"
Thanks,
/Meh
---------------------------------
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
Re: QUESTION: char content chunking in ContentHandler.characters()
Posted by Andy Clark <an...@apache.org>.
Meh-Lit Kim wrote:
> Is there any guarantee that the org.xml.sax.ContentHandler.characters()
> callback
> will not break a whitespace-separated 'word' into different chunks ?
Michael is absolutely right.
In case you're interested, though, Xerces will split
the callbacks at the following:
1) a newline -- because newline chars have to be
normalized and it's more efficient to break up
the callbacks than to copy buffers; and
2) the end of an internal buffer
So, in general, you can never rely on all of the data
you need to be passed in a single characters callback.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: QUESTION: char content chunking in ContentHandler.characters()
Posted by Michael Glavassevich <mr...@apache.org>.
Hi Meh-Lit,
No. SAX parsers are free to split character data [1] into as any much
chunks as they please, and they can split the text at whatever boundaries
they want. In order to handle this properly, your handler needs to
accumulate the text returned in each call until you recieve a callback
that isn't characters.
[1]
http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)
On Fri, 3 Oct 2003, Meh-Lit Kim wrote:
> Hi,
>
> Is there any guarantee that the org.xml.sax.ContentHandler.characters() callback
> will not break a whitespace-separated 'word' into different chunks ?
>
> e.g., given the following XML fragment :
>
> <NumberList>
> 111 222 333
> 444 555 666
> </NumberList>
>
> The possible values for the string corresponding to the input param
> 'ch[start] ... ch[start+length-1]' in the callback method
>
> org.xml.sax.ContentHandler( char[] ch, int start, int length )
>
> may be something like :
> "111 222 333"
> "444 555 666"
>
> but will NEVER be something like :
> "111 22"
> "2 333"
> "444 5"
> "55 666"
>
> Thanks,
> /Meh
>
>
> ---------------------------------
> Do you Yahoo!?
> The New Yahoo! Shopping - with improved product search
--
--------------------
Michael Glavassevich
mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org