You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Lars Huttar <la...@sil.org> on 2004/08/03 05:25:12 UTC

memory issues, SAX

Dear Cocoon gurus,              [Cocoon 2.1.2, Tomcat 4.1]

We have an application where we need to generate an index
from a large database. We seem to be running out of memory
even in getting the unprocessed out of the database.

We initially did (sitemap pseudocode)
  - xsp query to get rows from "Index" table of database
  - XSLT transformation that groups together rows with certain identical fields
  - XSLT transformation that wraps "source:write" markup around
    the XML
  - the write-source transformer to put the XML into a file
  - (serialize as XML)

This worked for small rowsets, but when we jump from 3700 to
9500 rows, it fails, with the message
org.apache.cocoon.ProcessingException: Exception in ServerPagesGenerator.generate():
java.lang.RuntimeException: org.apache.cocoon.ProcessingException: insertFragment: fragment is
required.

which sounds like write-source transformer is complaining that it didn't
get its "fragment" (data to write to the file), so I supposed
there was a failure before the write-source transformer.
I wondered if the XSLT transformations were each building
a DOM for the entire input. This would account for running out
of memory.

So I tried reducing the pipeline to just obtaining the data
and writing it to a file without grouping.
First I tried

  - xsp query to get rows from "Index" table of database
  - XSLT transformation that just wraps "source:write" markup around
    the XML
  - the write-source transformer to put the XML into a file

but this failed too, and of course it has an XSLT transformation
which is suspect -- is it building a DOM? So next I tried

  - file generator to get a file that contained a source:write
    wrapper around a cinclude statement
  - cinclude transformer to get the data
  - the write-source transformer to put the XML into a file

And in a separate pipeline called by the cinclude statement,

  - xsp query to get rows from "Index" table of database

But this still failed!

So now I'm wondering how it's possible to process big sets
of data at all in Cocoon. We thought SAX meant that the XML
data was sent piece-by-piece down the pipeline, serially,
so you didn't run out of memory when you had a big XML data
file. Does using XSLT mess that up by building DOMs?
What about cinclude?
What *can* you use to get lots of data from a database
and process it without having to have it all in memory
at once? Does this task need to be done outside of Cocoon?

Of course, we can split the operation up into little pieces;
but we don't want to go through that hassle if it's avoidable.

Is it possible that I'm missing the point completely and
there's something other than memory that's causing the
operation to fail?
By the way this machine has 384MB, and another I was testing
on had 512MB. They both failed at about the same point.

Thanks for any explanations or suggestions...
Lars


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: memory issues, SAX

Posted by Bruno Dumon <br...@outerthought.org>.
Lars,

are you aware that the Java VM doesn't automatically use all available
memory? IIRC on Windows by default it uses maximum 64 MB. You can
augment that by adding a parameter like -Xmx192m to the java startup
command. If you search the cocoon wiki or the internet you'll find
plenty of information on that.

As for the XSLT transforms: yes, these build a complete model of your
document in memory before doing the transform, though it's an optimized
model that should be much smaller then a typical DOM. From your earlier
non-updateable-dom problems, I'd assume that the
SourceWritingTransformer also builds a DOM in memory. By default the
serialized output of a pipeline is also completely buffered before
sending it to the client, see the explanation for "outputBufferSize" in
the default root sitemap on how to avoid that.

On Tue, 2004-08-03 at 05:25, Lars Huttar wrote:
> Dear Cocoon gurus,              [Cocoon 2.1.2, Tomcat 4.1]
> 
> We have an application where we need to generate an index
> from a large database. We seem to be running out of memory
> even in getting the unprocessed out of the database.
> 
> We initially did (sitemap pseudocode)
>   - xsp query to get rows from "Index" table of database
>   - XSLT transformation that groups together rows with certain identical fields
>   - XSLT transformation that wraps "source:write" markup around
>     the XML
>   - the write-source transformer to put the XML into a file
>   - (serialize as XML)
> 
> This worked for small rowsets, but when we jump from 3700 to
> 9500 rows, it fails, with the message
> org.apache.cocoon.ProcessingException: Exception in ServerPagesGenerator.generate():
> java.lang.RuntimeException: org.apache.cocoon.ProcessingException: insertFragment: fragment is
> required.
> 
> which sounds like write-source transformer is complaining that it didn't
> get its "fragment" (data to write to the file), so I supposed
> there was a failure before the write-source transformer.
> I wondered if the XSLT transformations were each building
> a DOM for the entire input. This would account for running out
> of memory.
> 
> So I tried reducing the pipeline to just obtaining the data
> and writing it to a file without grouping.
> First I tried
> 
>   - xsp query to get rows from "Index" table of database
>   - XSLT transformation that just wraps "source:write" markup around
>     the XML
>   - the write-source transformer to put the XML into a file
> 
> but this failed too, and of course it has an XSLT transformation
> which is suspect -- is it building a DOM? So next I tried
> 
>   - file generator to get a file that contained a source:write
>     wrapper around a cinclude statement
>   - cinclude transformer to get the data
>   - the write-source transformer to put the XML into a file
> 
> And in a separate pipeline called by the cinclude statement,
> 
>   - xsp query to get rows from "Index" table of database
> 
> But this still failed!
> 
> So now I'm wondering how it's possible to process big sets
> of data at all in Cocoon. We thought SAX meant that the XML
> data was sent piece-by-piece down the pipeline, serially,
> so you didn't run out of memory when you had a big XML data
> file. Does using XSLT mess that up by building DOMs?
> What about cinclude?
> What *can* you use to get lots of data from a database
> and process it without having to have it all in memory
> at once? Does this task need to be done outside of Cocoon?
> 
> Of course, we can split the operation up into little pieces;
> but we don't want to go through that hassle if it's avoidable.
> 
> Is it possible that I'm missing the point completely and
> there's something other than memory that's causing the
> operation to fail?
> By the way this machine has 384MB, and another I was testing
> on had 512MB. They both failed at about the same point.
> 
> Thanks for any explanations or suggestions...
> Lars

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org