You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by ad...@free.fr on 2004/03/18 15:45:28 UTC

Re: performance with transformation on BIG xml files

So nobody ever served big xml files with Cocoon ?


adrian.dimulescu@free.fr wrote:

>
> Hello,
>
> I would like to know if anybody used cocoon for transforming big XML 
> files and which were the tunings to be made in order to make 
> performance acceptable.
>
> My case: I do a little digital library projet; among the books in this 
> online library, I have a Bible in TEI format which I present to the 
> reader as a nicely formatted HTML with a tree-like table of contents 
> on the left side, navigation buttons (previous/next) etc.
>
> The Bible is no little book, that is true, but the performance is 
> terrible. In order to "suck out" the static version I use "wget -m 
> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a 
> static version online, not the Cocoon dynamic one as I don't have free 
> Cocoon hosting) --- generating all the HTML little files for that 
> Bible may take a day.
>
> I does work ok for smaller works like novels, poetry books. The Bible 
> includes all its chapters as entities. Please feel free to ask for 
> more information, if you need it.
>
> I'll paste at the end of this mail my transformation sitemap, in case 
> anybody has any ideas.
> Thanks in advance for your ideas.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Adrian Petru Dimulescu <ad...@free.fr>.
On Thursday 18 March 2004 18:08, Bruno Dumon wrote:

> This is normal behaviour. It doesn't matter whether you extract a small
> or a big part, the XSLT processor will always load the complete document
> into memory.

Thanks, that explains it. I'll be looking for a way to extract chapters of 
large books in a less expensive fashion. 

An XML database may be one way of doing this; I am a bit reluctant about using 
XIndice, though, as having simple XML files is quite attractive, as opposed 
to having a database in which updates become more difficult.

Ok, we'll live and see.

Best regards,
Adrian.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2004-03-18 at 17:04, adrian.dimulescu@free.fr wrote:
> Thank you all for your suggestions. I'm a little puzzled on one point, 
> though: does extracting a little part of a large file involve much 
> memory consumption by XSLT? Because I think the problem boils down to 
> this and I wonder if it's my faulty XSLT stylesheets or it's "normal" 
> behavior.

This is normal behaviour. It doesn't matter whether you extract a small
or a big part, the XSLT processor will always load the complete document
into memory.

Also, if the pages you generate are large, read the comment in the root
sitemap.xmap about the outputBufferSize parameter.

If you are creating these pages only once, instead of serving them live,
you might also want to disable caching (type="noncaching" on the
map:pipeline element). This can also save a lot of memory.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by ad...@free.fr.
Thank you all for your suggestions. I'm a little puzzled on one point, 
though: does extracting a little part of a large file involve much 
memory consumption by XSLT? Because I think the problem boils down to 
this and I wonder if it's my faulty XSLT stylesheets or it's "normal" 
behavior.

Thanks,
Adrian.

Peter Velychko wrote:

>Hello adrian,
>
>I tried to use stx block based on joost processor and was impressed by
>it. It's really faster than xslt and allows process large files.
>Additional info:
>http://stx.sourceforge.net/
>http://joost.sourceforge.net/
>
>  
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Peter Velychko <v_...@ukr.net>.
Hello adrian,

I tried to use stx block based on joost processor and was impressed by
it. It's really faster than xslt and allows process large files.
Additional info:
http://stx.sourceforge.net/
http://joost.sourceforge.net/

adff> So nobody ever served big xml files with Cocoon ?


adff> adrian.dimulescu@free.fr wrote:

>>
>> Hello,
>>
>> I would like to know if anybody used cocoon for transforming big XML 
>> files and which were the tunings to be made in order to make 
>> performance acceptable.
>>
>> My case: I do a little digital library projet; among the books in this 
>> online library, I have a Bible in TEI format which I present to the 
>> reader as a nicely formatted HTML with a tree-like table of contents 
>> on the left side, navigation buttons (previous/next) etc.
>>
>> The Bible is no little book, that is true, but the performance is 
>> terrible. In order to "suck out" the static version I use "wget -m 
>> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a 
>> static version online, not the Cocoon dynamic one as I don't have free 
>> Cocoon hosting) --- generating all the HTML little files for that 
>> Bible may take a day.
>>
>> I does work ok for smaller works like novels, poetry books. The Bible 
>> includes all its chapters as entities. Please feel free to ask for 
>> more information, if you need it.
>>
>> I'll paste at the end of this mail my transformation sitemap, in case 
>> anybody has any ideas.
>> Thanks in advance for your ideas.
>>
>>

adff> ---------------------------------------------------------------------
adff> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
adff> For additional commands, e-mail: users-help@cocoon.apache.org

-- 
Best regards,
Peter Velychko                            
v_peter@ukr.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Pete <pe...@xml.grumpykitty.biz>.
Just FYI, we streamed 20Mb XML log files to the browser and saw them 
render incrementally.

Personally I believe that the decision to stream or not to stream is 
made within the XSLT implementation,
and that the default Trax transformer in 2.1 does stream.

Here are some snippets from our sitemaps:

    <map:transformer logger="sitemap.transformer.xslt" name="xslt" 
pool-grow="2" pool-max="32" pool-min="8" 
src="org.apache.cocoon.transformation.TraxTransformer">
      <use-request-parameters>false</use-request-parameters>
      <use-session-parameters>false</use-session-parameters>
      <use-cookie-parameters>false</use-cookie-parameters>
      <xslt-processor-role>xsltc</xslt-processor-role>
    </map:transformer>

    <map:match pattern="alarmlog.xhtml">   
        <map:generate type="customlog" label="data">
            <map:parameter name="log-type" value="customlogtype" />
        </map:generate>
        <map:transform src="tohtml.xsl" />
        <map:call resource="serialize-screen" />
    </map:match>  

J.Pietschmann wrote:

> Bertrand Delacretaz wrote:
>
>> Also, some XSLT constructs will force the XSLT processor to load the 
>> whole thing in memory instead of streaming it.
>
>
> I was under the impression that both XSLTC and Saxon always load
> the whole XML, and that Xalan'S streaming processing was disabled
> in the default Cocoon setup and has to be explicitely enabled, so
> it is likely to load everything into memory as well.
>
> J.Pietschmann
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
>


little Cocoon digital library project (was Re: performance with transformation on BIG xml files)

Posted by Adrian Petru Dimulescu <ad...@free.fr>.
Hello,

I managed to put my little digital library project online as a live Cocoon 
site (yesss!)  

The purpose of the site is publishing mainly Romanian texts, so the site is 
mainly in Romanian. However the index page is translated into English and 
French.

You can take a look at it here: http://www.scriptorium.ro (you'll have to 
click the "English" link on the right until I figure out how to switch pages 
automatically using the Accepted-Languages header :)

Best regards,
Adrian

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by ad...@free.fr.
>The STX stuff might be more helpful to you, or possibly a custom SAX
>transformer.
>  
>
I'll definitely dig in this direction.

>BTW, if you're only using XSLT and only generating your stuff one time
>(the bible doesn't change anymore, does it?), you might as well use Ant.
>This would allow you to run the split operation just once to generate a
>bunch of files, and then process each of those files individually.
>  
>
The Bible itself may not change but there are occasional typos and 
corrections to be done.

Indeed I can split the Bible into pieces and present the individual 
books but as I developed a little Cocoon-based framework for HTML 
presentation which is centered around a TEI book, that would mean the 
user not having all the Bible. That is of course not a tragedy, but I 
think there must/should be a  way to rapidly generate presentations of 
small pieces of big works. It seems that breaking works into 
reasonable-sized pieces can only be a temporary solution.

I'll let you know how this works out, thank you all for your help!
Adrian.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Bruno Dumon <br...@outerthought.org>.
On Fri, 2004-03-19 at 12:44, adrian.dimulescu@free.fr wrote:
> The streaming feature of Xalan won't save you any memory. The only
> 
> >difference is that the transformation will start while the DOM (DTM) is
> >still being built, but in the end it still builds the complete DTM
> >(AFAIU).
> >  
> >
> So what XSLT processor would you recommend?

none in particular. In the general case, an XSLT needs random access to
the source document, so there's not much to do about it.

The STX stuff might be more helpful to you, or possibly a custom SAX
transformer.

BTW, if you're only using XSLT and only generating your stuff one time
(the bible doesn't change anymore, does it?), you might as well use Ant.
This would allow you to run the split operation just once to generate a
bunch of files, and then process each of those files individually.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by "J.Pietschmann" <j3...@yahoo.de>.
adrian.dimulescu@free.fr wrote:
> So what XSLT processor would you recommend?

For many problems, Saxon 6.5.x has the least memory footprint, with 
Saxon 7.x being a close second. There are however quite a few important
problems where Xalan in streaming mode and XSLTC have an edge (or Saxon
is deadly slow) therefore you probably have to experiment a bit. There
are also some subtle incompatiblities between the various processors.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by ad...@free.fr.
The streaming feature of Xalan won't save you any memory. The only

>difference is that the transformation will start while the DOM (DTM) is
>still being built, but in the end it still builds the complete DTM
>(AFAIU).
>  
>
So what XSLT processor would you recommend?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Bruno Dumon <br...@outerthought.org>.
On Fri, 2004-03-19 at 07:53, Bertrand Delacretaz wrote:
> Le Jeudi, 18 mars 2004, à 22:31 Europe/Zurich, J.Pietschmann a écrit :
> 
> > Bertrand Delacretaz wrote:
> >> Also, some XSLT constructs will force the XSLT processor to load the 
> >> whole thing in memory instead of streaming it.
> >
> > I was under the impression that both XSLTC and Saxon always load
> > the whole XML, and that Xalan'S streaming processing was disabled
> > in the default Cocoon setup and has to be explicitely enabled, so
> > it is likely to load everything into memory as well.
> 
> You're probably right, I didn't check whether streaming actually takes 
> place in the default Cocoon setup. So the first step would be to enable 
> and check it, for example by processing a large XML file with a simple 
> transform and watching memory consumption (or processor debugging 
> messages if they say something meaningful).

The streaming feature of Xalan won't save you any memory. The only
difference is that the transformation will start while the DOM (DTM) is
still being built, but in the end it still builds the complete DTM
(AFAIU).

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Bertrand Delacretaz <bd...@apache.org>.
Le Jeudi, 18 mars 2004, à 22:31 Europe/Zurich, J.Pietschmann a écrit :

> Bertrand Delacretaz wrote:
>> Also, some XSLT constructs will force the XSLT processor to load the 
>> whole thing in memory instead of streaming it.
>
> I was under the impression that both XSLTC and Saxon always load
> the whole XML, and that Xalan'S streaming processing was disabled
> in the default Cocoon setup and has to be explicitely enabled, so
> it is likely to load everything into memory as well.

You're probably right, I didn't check whether streaming actually takes 
place in the default Cocoon setup. So the first step would be to enable 
and check it, for example by processing a large XML file with a simple 
transform and watching memory consumption (or processor debugging 
messages if they say something meaningful).

-Bertrand


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Bertrand Delacretaz wrote:
> Also, some XSLT constructs will force the XSLT processor to load the 
> whole thing in memory instead of streaming it.

I was under the impression that both XSLTC and Saxon always load
the whole XML, and that Xalan'S streaming processing was disabled
in the default Cocoon setup and has to be explicitely enabled, so
it is likely to load everything into memory as well.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Bertrand Delacretaz <bd...@apache.org>.
Le Jeudi, 18 mars 2004, à 16:03 Europe/Zurich, Geoff Howard a écrit :
> ...Look for solutions people have tried involving variously saxon, stx 
> (joost), momento, and memory tuning...

Also, some XSLT constructs will force the XSLT processor to load the 
whole thing in memory instead of streaming it.

To find out which constructs cause the problem, you could start with a 
minimal XSLT (identity transform should give good performance), and 
gradually add templates from your transformation until it breaks 
performance-wise.

-Bertrand


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: performance with transformation on BIG xml files

Posted by Geoff Howard <co...@leverageweb.com>.
adrian.dimulescu@free.fr wrote:

> So nobody ever served big xml files with Cocoon ?

Don't confuse no answer given with no answer existing.  I have not personally 
(although I was involved with using Cocoon to present the bible in xml format - 
but we split it up into 66 files - one file per book - instead of one big file) 
but many have.  What have you found in the archives for this list and the dev 
list?

In the end, this is an issue with any xsl transformation of large files which 
generally involves putting a bloated dom representation in memory.  If you don't 
have enough memory (sounds like the case) you will get a lot of paging out to 
disk which will lead to abysmal performance.  Look for solutions people have 
tried involving variously saxon, stx (joost), momento, and memory tuning.

Geoff


> adrian.dimulescu@free.fr wrote:
> 
>>
>> Hello,
>>
>> I would like to know if anybody used cocoon for transforming big XML 
>> files and which were the tunings to be made in order to make 
>> performance acceptable.
>>
>> My case: I do a little digital library projet; among the books in this 
>> online library, I have a Bible in TEI format which I present to the 
>> reader as a nicely formatted HTML with a tree-like table of contents 
>> on the left side, navigation buttons (previous/next) etc.
>>
>> The Bible is no little book, that is true, but the performance is 
>> terrible. In order to "suck out" the static version I use "wget -m 
>> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a 
>> static version online, not the Cocoon dynamic one as I don't have free 
>> Cocoon hosting) --- generating all the HTML little files for that 
>> Bible may take a day.
>>
>> I does work ok for smaller works like novels, poetry books. The Bible 
>> includes all its chapters as entities. Please feel free to ask for 
>> more information, if you need it.
>>
>> I'll paste at the end of this mail my transformation sitemap, in case 
>> anybody has any ideas.
>> Thanks in advance for your ideas.
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
> 
> 
> 
> .
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org