You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by ad...@free.fr on 2004/03/18 15:45:28 UTC
Re: performance with transformation on BIG xml files
So nobody ever served big xml files with Cocoon ?
adrian.dimulescu@free.fr wrote:
>
> Hello,
>
> I would like to know if anybody used cocoon for transforming big XML
> files and which were the tunings to be made in order to make
> performance acceptable.
>
> My case: I do a little digital library projet; among the books in this
> online library, I have a Bible in TEI format which I present to the
> reader as a nicely formatted HTML with a tree-like table of contents
> on the left side, navigation buttons (previous/next) etc.
>
> The Bible is no little book, that is true, but the performance is
> terrible. In order to "suck out" the static version I use "wget -m
> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a
> static version online, not the Cocoon dynamic one as I don't have free
> Cocoon hosting) --- generating all the HTML little files for that
> Bible may take a day.
>
> I does work ok for smaller works like novels, poetry books. The Bible
> includes all its chapters as entities. Please feel free to ask for
> more information, if you need it.
>
> I'll paste at the end of this mail my transformation sitemap, in case
> anybody has any ideas.
> Thanks in advance for your ideas.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Adrian Petru Dimulescu <ad...@free.fr>.
On Thursday 18 March 2004 18:08, Bruno Dumon wrote:
> This is normal behaviour. It doesn't matter whether you extract a small
> or a big part, the XSLT processor will always load the complete document
> into memory.
Thanks, that explains it. I'll be looking for a way to extract chapters of
large books in a less expensive fashion.
An XML database may be one way of doing this; I am a bit reluctant about using
XIndice, though, as having simple XML files is quite attractive, as opposed
to having a database in which updates become more difficult.
Ok, we'll live and see.
Best regards,
Adrian.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2004-03-18 at 17:04, adrian.dimulescu@free.fr wrote:
> Thank you all for your suggestions. I'm a little puzzled on one point,
> though: does extracting a little part of a large file involve much
> memory consumption by XSLT? Because I think the problem boils down to
> this and I wonder if it's my faulty XSLT stylesheets or it's "normal"
> behavior.
This is normal behaviour. It doesn't matter whether you extract a small
or a big part, the XSLT processor will always load the complete document
into memory.
Also, if the pages you generate are large, read the comment in the root
sitemap.xmap about the outputBufferSize parameter.
If you are creating these pages only once, instead of serving them live,
you might also want to disable caching (type="noncaching" on the
map:pipeline element). This can also save a lot of memory.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by ad...@free.fr.
Thank you all for your suggestions. I'm a little puzzled on one point,
though: does extracting a little part of a large file involve much
memory consumption by XSLT? Because I think the problem boils down to
this and I wonder if it's my faulty XSLT stylesheets or it's "normal"
behavior.
Thanks,
Adrian.
Peter Velychko wrote:
>Hello adrian,
>
>I tried to use stx block based on joost processor and was impressed by
>it. It's really faster than xslt and allows process large files.
>Additional info:
>http://stx.sourceforge.net/
>http://joost.sourceforge.net/
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Peter Velychko <v_...@ukr.net>.
Hello adrian,
I tried to use stx block based on joost processor and was impressed by
it. It's really faster than xslt and allows process large files.
Additional info:
http://stx.sourceforge.net/
http://joost.sourceforge.net/
adff> So nobody ever served big xml files with Cocoon ?
adff> adrian.dimulescu@free.fr wrote:
>>
>> Hello,
>>
>> I would like to know if anybody used cocoon for transforming big XML
>> files and which were the tunings to be made in order to make
>> performance acceptable.
>>
>> My case: I do a little digital library projet; among the books in this
>> online library, I have a Bible in TEI format which I present to the
>> reader as a nicely formatted HTML with a tree-like table of contents
>> on the left side, navigation buttons (previous/next) etc.
>>
>> The Bible is no little book, that is true, but the performance is
>> terrible. In order to "suck out" the static version I use "wget -m
>> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a
>> static version online, not the Cocoon dynamic one as I don't have free
>> Cocoon hosting) --- generating all the HTML little files for that
>> Bible may take a day.
>>
>> I does work ok for smaller works like novels, poetry books. The Bible
>> includes all its chapters as entities. Please feel free to ask for
>> more information, if you need it.
>>
>> I'll paste at the end of this mail my transformation sitemap, in case
>> anybody has any ideas.
>> Thanks in advance for your ideas.
>>
>>
adff> ---------------------------------------------------------------------
adff> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
adff> For additional commands, e-mail: users-help@cocoon.apache.org
--
Best regards,
Peter Velychko
v_peter@ukr.net
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Pete <pe...@xml.grumpykitty.biz>.
Just FYI, we streamed 20Mb XML log files to the browser and saw them
render incrementally.
Personally I believe that the decision to stream or not to stream is
made within the XSLT implementation,
and that the default Trax transformer in 2.1 does stream.
Here are some snippets from our sitemaps:
<map:transformer logger="sitemap.transformer.xslt" name="xslt"
pool-grow="2" pool-max="32" pool-min="8"
src="org.apache.cocoon.transformation.TraxTransformer">
<use-request-parameters>false</use-request-parameters>
<use-session-parameters>false</use-session-parameters>
<use-cookie-parameters>false</use-cookie-parameters>
<xslt-processor-role>xsltc</xslt-processor-role>
</map:transformer>
<map:match pattern="alarmlog.xhtml">
<map:generate type="customlog" label="data">
<map:parameter name="log-type" value="customlogtype" />
</map:generate>
<map:transform src="tohtml.xsl" />
<map:call resource="serialize-screen" />
</map:match>
J.Pietschmann wrote:
> Bertrand Delacretaz wrote:
>
>> Also, some XSLT constructs will force the XSLT processor to load the
>> whole thing in memory instead of streaming it.
>
>
> I was under the impression that both XSLTC and Saxon always load
> the whole XML, and that Xalan'S streaming processing was disabled
> in the default Cocoon setup and has to be explicitely enabled, so
> it is likely to load everything into memory as well.
>
> J.Pietschmann
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
>
little Cocoon digital library project (was Re: performance with transformation on BIG xml files)
Posted by Adrian Petru Dimulescu <ad...@free.fr>.
Hello,
I managed to put my little digital library project online as a live Cocoon
site (yesss!)
The purpose of the site is publishing mainly Romanian texts, so the site is
mainly in Romanian. However the index page is translated into English and
French.
You can take a look at it here: http://www.scriptorium.ro (you'll have to
click the "English" link on the right until I figure out how to switch pages
automatically using the Accepted-Languages header :)
Best regards,
Adrian
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by ad...@free.fr.
>The STX stuff might be more helpful to you, or possibly a custom SAX
>transformer.
>
>
I'll definitely dig in this direction.
>BTW, if you're only using XSLT and only generating your stuff one time
>(the bible doesn't change anymore, does it?), you might as well use Ant.
>This would allow you to run the split operation just once to generate a
>bunch of files, and then process each of those files individually.
>
>
The Bible itself may not change but there are occasional typos and
corrections to be done.
Indeed I can split the Bible into pieces and present the individual
books but as I developed a little Cocoon-based framework for HTML
presentation which is centered around a TEI book, that would mean the
user not having all the Bible. That is of course not a tragedy, but I
think there must/should be a way to rapidly generate presentations of
small pieces of big works. It seems that breaking works into
reasonable-sized pieces can only be a temporary solution.
I'll let you know how this works out, thank you all for your help!
Adrian.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Bruno Dumon <br...@outerthought.org>.
On Fri, 2004-03-19 at 12:44, adrian.dimulescu@free.fr wrote:
> The streaming feature of Xalan won't save you any memory. The only
>
> >difference is that the transformation will start while the DOM (DTM) is
> >still being built, but in the end it still builds the complete DTM
> >(AFAIU).
> >
> >
> So what XSLT processor would you recommend?
none in particular. In the general case, an XSLT needs random access to
the source document, so there's not much to do about it.
The STX stuff might be more helpful to you, or possibly a custom SAX
transformer.
BTW, if you're only using XSLT and only generating your stuff one time
(the bible doesn't change anymore, does it?), you might as well use Ant.
This would allow you to run the split operation just once to generate a
bunch of files, and then process each of those files individually.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by "J.Pietschmann" <j3...@yahoo.de>.
adrian.dimulescu@free.fr wrote:
> So what XSLT processor would you recommend?
For many problems, Saxon 6.5.x has the least memory footprint, with
Saxon 7.x being a close second. There are however quite a few important
problems where Xalan in streaming mode and XSLTC have an edge (or Saxon
is deadly slow) therefore you probably have to experiment a bit. There
are also some subtle incompatiblities between the various processors.
J.Pietschmann
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by ad...@free.fr.
The streaming feature of Xalan won't save you any memory. The only
>difference is that the transformation will start while the DOM (DTM) is
>still being built, but in the end it still builds the complete DTM
>(AFAIU).
>
>
So what XSLT processor would you recommend?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Bruno Dumon <br...@outerthought.org>.
On Fri, 2004-03-19 at 07:53, Bertrand Delacretaz wrote:
> Le Jeudi, 18 mars 2004, à 22:31 Europe/Zurich, J.Pietschmann a écrit :
>
> > Bertrand Delacretaz wrote:
> >> Also, some XSLT constructs will force the XSLT processor to load the
> >> whole thing in memory instead of streaming it.
> >
> > I was under the impression that both XSLTC and Saxon always load
> > the whole XML, and that Xalan'S streaming processing was disabled
> > in the default Cocoon setup and has to be explicitely enabled, so
> > it is likely to load everything into memory as well.
>
> You're probably right, I didn't check whether streaming actually takes
> place in the default Cocoon setup. So the first step would be to enable
> and check it, for example by processing a large XML file with a simple
> transform and watching memory consumption (or processor debugging
> messages if they say something meaningful).
The streaming feature of Xalan won't save you any memory. The only
difference is that the transformation will start while the DOM (DTM) is
still being built, but in the end it still builds the complete DTM
(AFAIU).
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Bertrand Delacretaz <bd...@apache.org>.
Le Jeudi, 18 mars 2004, à 22:31 Europe/Zurich, J.Pietschmann a écrit :
> Bertrand Delacretaz wrote:
>> Also, some XSLT constructs will force the XSLT processor to load the
>> whole thing in memory instead of streaming it.
>
> I was under the impression that both XSLTC and Saxon always load
> the whole XML, and that Xalan'S streaming processing was disabled
> in the default Cocoon setup and has to be explicitely enabled, so
> it is likely to load everything into memory as well.
You're probably right, I didn't check whether streaming actually takes
place in the default Cocoon setup. So the first step would be to enable
and check it, for example by processing a large XML file with a simple
transform and watching memory consumption (or processor debugging
messages if they say something meaningful).
-Bertrand
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Bertrand Delacretaz wrote:
> Also, some XSLT constructs will force the XSLT processor to load the
> whole thing in memory instead of streaming it.
I was under the impression that both XSLTC and Saxon always load
the whole XML, and that Xalan'S streaming processing was disabled
in the default Cocoon setup and has to be explicitely enabled, so
it is likely to load everything into memory as well.
J.Pietschmann
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Bertrand Delacretaz <bd...@apache.org>.
Le Jeudi, 18 mars 2004, à 16:03 Europe/Zurich, Geoff Howard a écrit :
> ...Look for solutions people have tried involving variously saxon, stx
> (joost), momento, and memory tuning...
Also, some XSLT constructs will force the XSLT processor to load the
whole thing in memory instead of streaming it.
To find out which constructs cause the problem, you could start with a
minimal XSLT (identity transform should give good performance), and
gradually add templates from your transformation until it breaks
performance-wise.
-Bertrand
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: performance with transformation on BIG xml files
Posted by Geoff Howard <co...@leverageweb.com>.
adrian.dimulescu@free.fr wrote:
> So nobody ever served big xml files with Cocoon ?
Don't confuse no answer given with no answer existing. I have not personally
(although I was involved with using Cocoon to present the bible in xml format -
but we split it up into 66 files - one file per book - instead of one big file)
but many have. What have you found in the archives for this list and the dev
list?
In the end, this is an issue with any xsl transformation of large files which
generally involves putting a bloated dom representation in memory. If you don't
have enough memory (sounds like the case) you will get a lot of paging out to
disk which will lead to abysmal performance. Look for solutions people have
tried involving variously saxon, stx (joost), momento, and memory tuning.
Geoff
> adrian.dimulescu@free.fr wrote:
>
>>
>> Hello,
>>
>> I would like to know if anybody used cocoon for transforming big XML
>> files and which were the tunings to be made in order to make
>> performance acceptable.
>>
>> My case: I do a little digital library projet; among the books in this
>> online library, I have a Bible in TEI format which I present to the
>> reader as a nicely formatted HTML with a tree-like table of contents
>> on the left side, navigation buttons (previous/next) etc.
>>
>> The Bible is no little book, that is true, but the performance is
>> terrible. In order to "suck out" the static version I use "wget -m
>> http://my.local.cocoon.instalation:8080/cocoon/..." (I publish a
>> static version online, not the Cocoon dynamic one as I don't have free
>> Cocoon hosting) --- generating all the HTML little files for that
>> Bible may take a day.
>>
>> I does work ok for smaller works like novels, poetry books. The Bible
>> includes all its chapters as entities. Please feel free to ask for
>> more information, if you need it.
>>
>> I'll paste at the end of this mail my transformation sitemap, in case
>> anybody has any ideas.
>> Thanks in advance for your ideas.
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org