You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Tom Bloomfield <to...@shopbloomfield.com> on 2004/11/19 02:45:32 UTC
Large XML transformations in Cocoon.
I'm planning to do xml -> text transformations (for tab-delimited
output) and xml -> FOP on large XML datasets. The XML I will be
processing will be 10-12 MB in size, and will grow from there. Based on
planning, the XSL will contain around 50 node traversals and will
iterate over my XML dataset around 46,000 times. Previous to this, my
Cocoon transformations haven't been nearly this big.
The amount of JVM memory I have to deal with is limited (<256M). This
transformation will need to run in real-time.
Does anyone have experience dealing with large datasets like this?
TIA,
Tom
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Upayavira <uv...@upaya.co.uk>.
Tom Bloomfield wrote:
> I'm planning to do xml -> text transformations (for tab-delimited
> output) and xml -> FOP on large XML datasets. The XML I will be
> processing will be 10-12 MB in size, and will grow from there. Based
> on planning, the XSL will contain around 50 node traversals and will
> iterate over my XML dataset around 46,000 times. Previous to this, my
> Cocoon transformations haven't been nearly this big.
>
> The amount of JVM memory I have to deal with is limited (<256M). This
> transformation will need to run in real-time.
> Does anyone have experience dealing with large datasets like this?
That sounds like quite a challenge. XSLT isn't that appropriate for that
sort of thing. Firstly, in XSLT, avoid arbitrary wanders around your XML
tree - stay as close to the context node as you can.
Alternatively, look at STX (there is an STX block). See if you can
manage your transformations with that. This is "streaming"
transformations for XML, i.e. it is designed for streaming, and thus
should be able to handle large datasets.
Regards, Upayavira
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Bertrand Delacretaz <bd...@apache.org>.
Le 19 nov. 04, à 02:45, Tom Bloomfield a écrit :
> ...The XML I will be processing will be 10-12 MB in size, and will
> grow from there. Based on planning, the XSL will contain around 50
> node traversals and will iterate over my XML dataset around 46,000
> times....
You'll probably have a hard time doing this on a 256-MB system.
In such a case I'd ask myself if my problem is *so* hard as to require
46'000 iterations over the XML dataset. Of course it depends on the
kind of data you're processing, but this sounds very unusual.
-Bertrand
Re: Large XML transformations in Cocoon.
Posted by Miles Elam <mi...@pcextremist.com>.
Go right ahead. Anything I write to this mailing list is fair
game/public domain.
- Miles Elam
On Nov 20, 2004, at 7:26 AM, Upayavira wrote:
> Miles Elam wrote:
>
> Very useful piece. Would you mind if I put this on the wiki?
>
> Regards, Upayavira
>
>> As someone who has used STX, I can recommend it in this situation
>> wholeheartedly. STX looks very much like XSLT but uses a different
>> namespace and doesn't have as many options for transformation.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Upayavira <uv...@upaya.co.uk>.
Miles Elam wrote:
Very useful piece. Would you mind if I put this on the wiki?
Regards, Upayavira
> As someone who has used STX, I can recommend it in this situation
> wholeheartedly. STX looks very much like XSLT but uses a different
> namespace and doesn't have as many options for transformation.
>
> Unless something drastic has changed lately in the XSLT used by
> Cocoon, it uses a document table model (like a DOM but tailored toward
> a read-only view and a transformation source). This is necessary
> because XSLT allows several passes over the same source document and
> also allows arbitrary access to any point in the tree (although this
> is usually quite inefficient). So while XSLT is the preferred method
> for XML transformation in general, certain circumstances like yours
> would point toward alternatives.
>
> As far as streaming XSLT results is concerned, it's possible to
> configure it this way at the expense of overall processing time. But
> you don't appear to have the memory for even one full transformation
> let alone many at the same time. STX is your best bet in my opinion.
> This always streams the output by its very nature.
>
> Also, do NOT put this into a caching pipeline. With such a large
> source, memory constraints will get worse before they get better.
> Reprocess each time (or pregenerate on intervals a la cron) to shift
> the weight from memory to CPU/disk in this case.
>
> Of course, a final option is to write your own custom Cocoon
> transformer, but I would recommend the STX route as it would likely be
> almost as fast and a while lot more flexible and maintainable in the
> long run.
>
> - Miles Elam
>
>
> On Nov 19, 2004, at 7:07 AM, Tom Bloomfield wrote:
>
>> The number of iterations cooresponds to the number of rows returned
>> from the database. There are roughly 46,000 rows present now, so I
>> need at least that many rows in my display. The XSL design enables
>> me to use SAX which should help. The easiest thing would be to limit
>> the number of rows returned to something more reasonable like 10,000
>> (or up the JVM memory :P), but this is the requirement I'm stuck with.
>>
>> Help me understand this: If I apply a transformation using XSLT,
>> streaming the xml in, does Cocoon "stream" the results out? IE,
>> does the entire transformation happen in memory and then get flushed
>> to the client, or does Cocoon flush the buffer to the client as xxx
>> bytes are filled? I made an assumption that Cocoon does this
>> automatically.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Tom Bloomfield <to...@shopbloomfield.com>.
Miles,
Thanks for the tips. I'll move forward on coding this using STX and
post some benchmarking numbers when I finish.
TB
Upayavira wrote:
> Miles Elam wrote:
>
> Very useful piece. Would you mind if I put this on the wiki?
>
> Regards, Upayavira
>
>> As someone who has used STX, I can recommend it in this situation
>> wholeheartedly. STX looks very much like XSLT but uses a different
>> namespace and doesn't have as many options for transformation.
>>
>> Unless something drastic has changed lately in the XSLT used by
>> Cocoon, it uses a document table model (like a DOM but tailored
>> toward a read-only view and a transformation source). This is
>> necessary because XSLT allows several passes over the same source
>> document and also allows arbitrary access to any point in the tree
>> (although this is usually quite inefficient). So while XSLT is the
>> preferred method for XML transformation in general, certain
>> circumstances like yours would point toward alternatives.
>>
>> As far as streaming XSLT results is concerned, it's possible to
>> configure it this way at the expense of overall processing time. But
>> you don't appear to have the memory for even one full transformation
>> let alone many at the same time. STX is your best bet in my
>> opinion. This always streams the output by its very nature.
>>
>> Also, do NOT put this into a caching pipeline. With such a large
>> source, memory constraints will get worse before they get better.
>> Reprocess each time (or pregenerate on intervals a la cron) to shift
>> the weight from memory to CPU/disk in this case.
>>
>> Of course, a final option is to write your own custom Cocoon
>> transformer, but I would recommend the STX route as it would likely
>> be almost as fast and a while lot more flexible and maintainable in
>> the long run.
>>
>> - Miles Elam
>>
>>
>> On Nov 19, 2004, at 7:07 AM, Tom Bloomfield wrote:
>>
>>> The number of iterations cooresponds to the number of rows returned
>>> from the database. There are roughly 46,000 rows present now, so I
>>> need at least that many rows in my display. The XSL design enables
>>> me to use SAX which should help. The easiest thing would be to
>>> limit the number of rows returned to something more reasonable like
>>> 10,000 (or up the JVM memory :P), but this is the requirement I'm
>>> stuck with.
>>>
>>> Help me understand this: If I apply a transformation using XSLT,
>>> streaming the xml in, does Cocoon "stream" the results out? IE,
>>> does the entire transformation happen in memory and then get flushed
>>> to the client, or does Cocoon flush the buffer to the client as xxx
>>> bytes are filled? I made an assumption that Cocoon does this
>>> automatically.
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>> For additional commands, e-mail: users-help@cocoon.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Miles Elam <mi...@pcextremist.com>.
As someone who has used STX, I can recommend it in this situation
wholeheartedly. STX looks very much like XSLT but uses a different
namespace and doesn't have as many options for transformation.
Unless something drastic has changed lately in the XSLT used by Cocoon,
it uses a document table model (like a DOM but tailored toward a
read-only view and a transformation source). This is necessary because
XSLT allows several passes over the same source document and also
allows arbitrary access to any point in the tree (although this is
usually quite inefficient). So while XSLT is the preferred method for
XML transformation in general, certain circumstances like yours would
point toward alternatives.
As far as streaming XSLT results is concerned, it's possible to
configure it this way at the expense of overall processing time. But
you don't appear to have the memory for even one full transformation
let alone many at the same time. STX is your best bet in my opinion.
This always streams the output by its very nature.
Also, do NOT put this into a caching pipeline. With such a large
source, memory constraints will get worse before they get better.
Reprocess each time (or pregenerate on intervals a la cron) to shift
the weight from memory to CPU/disk in this case.
Of course, a final option is to write your own custom Cocoon
transformer, but I would recommend the STX route as it would likely be
almost as fast and a while lot more flexible and maintainable in the
long run.
- Miles Elam
On Nov 19, 2004, at 7:07 AM, Tom Bloomfield wrote:
> The number of iterations cooresponds to the number of rows returned
> from the database. There are roughly 46,000 rows present now, so I
> need at least that many rows in my display. The XSL design enables me
> to use SAX which should help. The easiest thing would be to limit the
> number of rows returned to something more reasonable like 10,000 (or
> up the JVM memory :P), but this is the requirement I'm stuck with.
>
> Help me understand this: If I apply a transformation using XSLT,
> streaming the xml in, does Cocoon "stream" the results out? IE, does
> the entire transformation happen in memory and then get flushed to the
> client, or does Cocoon flush the buffer to the client as xxx bytes are
> filled? I made an assumption that Cocoon does this automatically.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Large XML transformations in Cocoon.
Posted by Tom Bloomfield <to...@shopbloomfield.com>.
Upayavira, thanks for the heads up about STX. I'll check out Joost
later today.
The number of iterations cooresponds to the number of rows returned from
the database. There are roughly 46,000 rows present now, so I need at
least that many rows in my display. The XSL design enables me to use
SAX which should help. The easiest thing would be to limit the number
of rows returned to something more reasonable like 10,000 (or up the JVM
memory :P), but this is the requirement I'm stuck with.
Help me understand this: If I apply a transformation using XSLT,
streaming the xml in, does Cocoon "stream" the results out? IE, does
the entire transformation happen in memory and then get flushed to the
client, or does Cocoon flush the buffer to the client as xxx bytes are
filled? I made an assumption that Cocoon does this automatically.
If anyone else has any suggestions, please let me know.
TIA,
Tom
Bertrand Delacretaz wrote:
>
> Le 19 nov. 04, à 02:45, Tom Bloomfield a écrit :
>
>> ...The XML I will be processing will be 10-12 MB in size, and will
>> grow from there. Based on planning, the XSL will contain around 50
>> node traversals and will iterate over my XML dataset around 46,000
>> times....
>
>
> You'll probably have a hard time doing this on a 256-MB system.
>
> In such a case I'd ask myself if my problem is *so* hard as to require
> 46'000 iterations over the XML dataset. Of course it depends on the
> kind of data you're processing, but this sounds very unusual.
>
> -Bertrand
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org