You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Torsten Curdt <tc...@vafer.org> on 2003/11/19 02:53:06 UTC

[RT] SAX stream buffering

Hi, folks!

The numbers of the XMLByteStreamCompilerInterpreterTestCase and the
SaxBufferTestCase gave me some RT
--
If you have a look at the testcases it's quite obvious that the
SaxBuffer is *much* faster than the XMLByteStream classes.
As a thumb rule -just to get the dimensions- we could say:

  XMLC/XMLI is about 15 times faster than Xerces
  SaxBuffer is about 100 times faster than Xerces

Of course this depends heavily on the document. But it should be
enough to grasp the magnitude. Which was a bit of a surprise for
me. I personally did not expect this *huge* difference. Especially
because the SaxBuffer creates much more objects than the XMLC.

But the huge difference between the SaxBuffer and the XMLC is that
the XMLC serializes the SAX event on the fly. The SaxBuffer
does not support serialization but keeps the events as objects.

IMO spending time on the serialization only makes sense if

  a) the memory consumption is too high otherwise
  b) the SAX stream is being saved to disk

Maybe we can extend the testcases to compare the memory consumption.
For the question of the destination we could let the store decide.

Anyway both classes make sense. But maybe they would make even more
sense if they would share the same interface and would become
interchangeable.

The SAX stream buffering is a vital component of cocoon. Looking
at the numbers the impact on the performance could be tremendous.

What do you think?
--
Torsten


Re: [RT] SAX stream buffering

Posted by Torsten Curdt <tc...@vafer.org>.
Ugo Cei wrote:

> Torsten Curdt wrote:
> 
>> buf.length << 1 is a shift operation which is the same
>> as buf.length*2. The Max() chooses the bigger value.
>>
>> So that method is fine ;)
> 
> 
> But a little too clever for my taste ;-). It reminds me of the old days 
> with C compilers who weren't smart enough to convert a 
> multiplication/division by a power of 2 into a left/right shift. I 
> thought those days had passed forever, but it seems there's always 
> someone thinking he can outsmart the compiler ;-).

...I thought so too until I had a look into BCEL ;)

We could use that to see if the compiler is smart enough to
optimize that on his own :) ...at least it would be interesting
to know for the future.
--
Torsten


Re: [RT] SAX stream buffering

Posted by Ugo Cei <u....@cbim.it>.
Torsten Curdt wrote:
> buf.length << 1 is a shift operation which is the same
> as buf.length*2. The Max() chooses the bigger value.
> 
> So that method is fine ;)

But a little too clever for my taste ;-). It reminds me of the old days 
with C compilers who weren't smart enough to convert a 
multiplication/division by a power of 2 into a left/right shift. I 
thought those days had passed forever, but it seems there's always 
someone thinking he can outsmart the compiler ;-).

	Ugo

-- 
Ugo Cei - Consorzio di Bioingegneria e Informatica Medica
P.le Volontari del Sangue, 2 - 27100 Pavia - Italy
Phone: +39.0382.525100 - E-mail: u.cei@cbim.it


Re: [RT] SAX stream buffering

Posted by Torsten Curdt <tc...@vafer.org>.
> No the problem may be the opposite, and the XMLC may be eating way too 
> much memory: a linear growth rate would be IMO better.

Well, on each array "resize" we need to create a new one and copy.
You wouldn't want to do this too often of course. Doubling the
buffer size is a common approach.

>>> Can't we merge both: use SAXBuffer for in-memory storage, and use 
>>> XMLC/XMLI to serialize it? This could even be done transparently by 
>>> having SAXBuffer implementing Serializable and use XMLC/XMLI to 
>>> implement readObject() and writeObject().
>>
>>
>>
>> Hm... I don't know if I like that. Although it also came to my mind.
>>
>> That way we *always* have the memory consumption. It sounds reasonable 
>> from a OOP POV but it might not be a good choice in terms of 
>> scaleability ...I assume :-/
> 
> 
> 
> Any numbers on SAXBuffer's memory consumption?

Not yet. But every SAX event is an object. Even if we recycle the
SaxBuffer object it will hand over all the event objects to the GC.

With the XMLC there is much less work for the GC
--
Torsten


Re: [RT] SAX stream buffering

Posted by Sylvain Wallez <sy...@apache.org>.
Torsten Curdt wrote:

>> I'm not very surprised by these numbers: XMLC does a pretty heavy job 
>> to serialize Strings to bytes.
>>
>> Furthermore, I just looked at the XMLByteStreamCompiler.write() which 
>> shows that it spends most of its time resizing the byte buffer, as 
>> resizing is limited to the actual number of bytes needed for the 
>> current write, and not by a larger growth increment.
>>
>> It would be interesting to redo the test by introducing this growth 
>> increment. BTW, I don't understand the "this.buf.length << 1" in the 
>> write() method.
>
>
> Well, thats not exactly true:
>
> buf.length << 1 is a shift operation which is the same as 
> buf.length*2. The Max() chooses the bigger value.
>
> So that method is fine ;)


Yep. It's been such a long time that I've not used shift operators that 
I was suspecting some black magic here ;-)

No the problem may be the opposite, and the XMLC may be eating way too 
much memory: a linear growth rate would be IMO better.

<snip/>

>> Can't we merge both: use SAXBuffer for in-memory storage, and use 
>> XMLC/XMLI to serialize it? This could even be done transparently by 
>> having SAXBuffer implementing Serializable and use XMLC/XMLI to 
>> implement readObject() and writeObject().
>
>
> Hm... I don't know if I like that. Although it also came to my mind.
>
> That way we *always* have the memory consumption. It sounds reasonable 
> from a OOP POV but it might not be a good choice in terms of 
> scaleability ...I assume :-/


Any numbers on SAXBuffer's memory consumption?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Re: [RT] SAX stream buffering

Posted by Torsten Curdt <tc...@vafer.org>.
> I'm not very surprised by these numbers: XMLC does a pretty heavy job to 
> serialize Strings to bytes.
> 
> Furthermore, I just looked at the XMLByteStreamCompiler.write() which 
> shows that it spends most of its time resizing the byte buffer, as 
> resizing is limited to the actual number of bytes needed for the current 
> write, and not by a larger growth increment.
> 
> It would be interesting to redo the test by introducing this growth 
> increment. BTW, I don't understand the "this.buf.length << 1" in the 
> write() method.

Well, thats not exactly true:

buf.length << 1 is a shift operation which is the same
as buf.length*2. The Max() chooses the bigger value.

So that method is fine ;)

>> But the huge difference between the SaxBuffer and the XMLC is that the 
>> XMLC serializes the SAX event on the fly. The SaxBuffer does not 
>> support serialization but keeps the events as objects.
>>
>> IMO spending time on the serialization only makes sense if
>>
>>  a) the memory consumption is too high otherwise
>>  b) the SAX stream is being saved to disk
>>
>> Maybe we can extend the testcases to compare the memory consumption. 
>> For the question of the destination we could let the store decide.
>>
>> Anyway both classes make sense. But maybe they would make even more 
>> sense if they would share the same interface and would become 
>> interchangeable.
>>
>> The SAX stream buffering is a vital component of cocoon. Looking at 
>> the numbers the impact on the performance could be tremendous.
>>
>> What do you think?
> 
> 
> 
> Can't we merge both: use SAXBuffer for in-memory storage, and use 
> XMLC/XMLI to serialize it? This could even be done transparently by 
> having SAXBuffer implementing Serializable and use XMLC/XMLI to 
> implement readObject() and writeObject().

Hm... I don't know if I like that. Although it also came to my mind.

That way we *always* have the memory consumption. It sounds reasonable
from a OOP POV but it might not be a good choice in terms of
scaleability ...I assume :-/
--
Torsten


Re: [RT] SAX stream buffering

Posted by Sylvain Wallez <sy...@apache.org>.
Torsten Curdt wrote:

> Hi, folks!
>
> The numbers of the XMLByteStreamCompilerInterpreterTestCase and the 
> SaxBufferTestCase gave me some RT
> -- 
> If you have a look at the testcases it's quite obvious that the 
> SaxBuffer is *much* faster than the XMLByteStream classes. As a thumb 
> rule -just to get the dimensions- we could say:
>
>  XMLC/XMLI is about 15 times faster than Xerces SaxBuffer is about 100 
> times faster than Xerces
>
> Of course this depends heavily on the document. But it should be 
> enough to grasp the magnitude. Which was a bit of a surprise for me. I 
> personally did not expect this *huge* difference. Especially because 
> the SaxBuffer creates much more objects than the XMLC.


I'm not very surprised by these numbers: XMLC does a pretty heavy job to 
serialize Strings to bytes.

Furthermore, I just looked at the XMLByteStreamCompiler.write() which 
shows that it spends most of its time resizing the byte buffer, as 
resizing is limited to the actual number of bytes needed for the current 
write, and not by a larger growth increment.

It would be interesting to redo the test by introducing this growth 
increment. BTW, I don't understand the "this.buf.length << 1" in the 
write() method.

> But the huge difference between the SaxBuffer and the XMLC is that the 
> XMLC serializes the SAX event on the fly. The SaxBuffer does not 
> support serialization but keeps the events as objects.
>
> IMO spending time on the serialization only makes sense if
>
>  a) the memory consumption is too high otherwise
>  b) the SAX stream is being saved to disk
>
> Maybe we can extend the testcases to compare the memory consumption. 
> For the question of the destination we could let the store decide.
>
> Anyway both classes make sense. But maybe they would make even more 
> sense if they would share the same interface and would become 
> interchangeable.
>
> The SAX stream buffering is a vital component of cocoon. Looking at 
> the numbers the impact on the performance could be tremendous.
>
> What do you think?


Can't we merge both: use SAXBuffer for in-memory storage, and use 
XMLC/XMLI to serialize it? This could even be done transparently by 
having SAXBuffer implementing Serializable and use XMLC/XMLI to 
implement readObject() and writeObject().

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com