You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Cox, Randy" <ra...@countryfinancial.com> on 2001/07/17 23:02:54 UTC

Error in Sax parser - CharDataChunk size

I have encountered a problem in Xerces 1.2.0.  During a SAX parse in the
characters(char [], int, int) method.  The contents of the element I am
processing should be "1541".  What I am seeing is:

1. characters(char [], int, int) is called once with a length of 2 resulting
in a value of "15".
2. characters(char [], int, int) is called again with a length of 2
resulting in a value of "41".

I have traced this to the fact that is element is spans the CharDataChunk
chunk boundary with the "15" being in the first chunk and "41" in the second
chunk. I confirmed this by changing the CHUNK_SHIFT to 16 giving me a 64k
chunk from 16k.

I have the following questions?

1. Has this been corrected in a later version of Xerces?
2. Is this something that my handler must deal with and if so can you direct
me to some code patterns for solving the problem?
3. What problems if any will I encounter by increasing the chunk size?
4. Can this be configures without hacking the code?

Randolph M. Cox, CDP, CCP
Application Development Specialist
COUNTRY Insurance and Financial Services
1711 G.E. Road
Bloomington, Ill. 61702-2020
Ph. 309-821-3810
FAX 309-821-4009






---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Error in Sax parser - CharDataChunk size

Posted by Ian Roberts <ir...@decisionsoft.com>.
On Tue, 17 Jul 2001, Cox, Randy wrote:

> I have encountered a problem in Xerces 1.2.0.  During a SAX parse in the
> characters(char [], int, int) method.  The contents of the element I am
> processing should be "1541".  What I am seeing is:
> 
> 1. characters(char [], int, int) is called once with a length of 2 resulting
> in a value of "15".
> 2. characters(char [], int, int) is called again with a length of 2
> resulting in a value of "41".
> 
> I have the following questions?
> 
> 1. Has this been corrected in a later version of Xerces?
> 2. Is this something that my handler must deal with and if so can you direct
> me to some code patterns for solving the problem?

The SAX spec says that a parser is free to report the text in a given
element in a single call to characters, or in multiple calls, if that
better suits the implementation.  The handler is responsible for
collecting the data together into a single string.  The way I normally do
this is by creating a StringBuffer in the startElement method, appending
each chunk of characters to the buffer (it has an append(char[], start,
length) method which makes this very easy), and call toString on the
buffer in the endElement method.

Ian

-- 
Ian Roberts, Software Engineer        DecisionSoft Ltd.
Telephone: +44-1865-203192            http://www.decisionsoft.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org