You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Sylvain Wallez <sy...@apache.org> on 2003/11/11 22:07:46 UTC

Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Torsten Curdt wrote:

>>> ...I was wondering - is this a bug of the component that produces the
>>> SAX events or the XMLByteStreamCompiler? I mean: now it's ok - but 
>>> should we
>>> silently ignore the problem?
>>
>>
>> Torsten, I don't understand your concerns. Isn't the fix simply about 
>> handling text nodes longer than 32 k? Ok, they shouldn't occur that 
>> often (it's half a novel :-) ), but it's possible.
>
>
> ....we duplicate events here and the thereby modify the SAX stream.
> Should be no problem.... but who knows ;)
>
> with the patch:
>
>  characters(36k)
> ->
>  event
>  string 32k
>  event
>  string 4k
>
> I guess it would be better to have it like this:
>
>  characters(36k)
> ->
>  event
>  string 32k
>  string 4k
>
> So what goes in comes out the same way.


According to the SAX spec, a single text node can be split in an 
sequence of consecutive character() events, and all SAX handling code 
should be written to take care of this.

So sending two events should really not be a problem.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Posted by Torsten Curdt <tc...@vafer.org>.
J.Pietschmann wrote:
> Joerg Heinicke wrote:
> 
>> ?? Who should fix it that it works like expected, i.e. one text node 
>> in one p element?

this depends on the xslt processor or
the transformer in general

> XSLT processors generally hide the issue, whether by normalizing
> the input while or after the tree is built or by other means.

exactly

> The interesting stuff of course are XPath processors working on
> potentially unnormalized DOM structures, in particular Cocoons
> XInclude processor and the XPath transformer. I'd say you'll get
> all kind of interesting behaviour.

as I said: better let's get out what goes in

cheers
--
Torsten


Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Joerg Heinicke wrote:
> ?? Who should fix it that it works like expected, i.e. one text node in 
> one p element?

XSLT processors generally hide the issue, whether by normalizing
the input while or after the tree is built or by other means.

The interesting stuff of course are XPath processors working on
potentially unnormalized DOM structures, in particular Cocoons
XInclude processor and the XPath transformer. I'd say you'll get
all kind of interesting behaviour.

J.Pietschmann





Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Posted by Joerg Heinicke <jh...@virbus.de>.
On 11.11.2003 22:07, Sylvain Wallez wrote:

> According to the SAX spec, a single text node can be split in an 
> sequence of consecutive character() events, and all SAX handling code 
> should be written to take care of this.
> 
> So sending two events should really not be a problem.

But what about my simple sample

<xsl:template match="text()">
   <p><xsl:value-of select="."/></p>
</xsl:template>

?? Who should fix it that it works like expected, i.e. one text node in 
one p element?

Joerg