You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by neil <nb...@aisoftware.com.au> on 2002/02/20 06:00:31 UTC

streaming large pdf reports

I'm trying to get large PDF reports streaming to the browser so that:
1. not too much memory is used (we can support multiple users doing reports)
2. we can support arbitrarily large reports
3. the user sees data dribbling onto their screen rather than being left to
wonder
   if anything is happening

(Please don't ask if anybody can really do anything useful with such big
reports).

We'd like to get:
  a) SQLTransformer to perform a query,
  b) XSLT to produce fo,
  c) fop to produce pdf and
  d) the browser
to all stream properly, but at the moment it looks like there are problems
at b),
c) and d). Evidence follows.

I believe we should be able to get XSLT and fop streaming. Has anybody
managed to?
I'm using cocoon-2.0 with updates as described below. Does cocoon-2.0.1 have
some
other updates that are required to make this stream?



Test conditions are described at the end, here are the results:

                      a)query   b)fo      c)fop
before start of o/p   0:00:10   0:02:40   0:11:21
duration of o/p       0:02:20   0:00:40   0:01:38
size of o/p          43179378  22849290  33346282

The first column, a) shows the production of xml by SQLTransformer calling a
SQL Server stored procedure that returns 93000 rows. The file started
growing
after 10 secs and was complete after another 2 minutes 20 secs. So
SQLTransformer
by itself streams OK.

The 2nd column, b) shows the same thing as a) but with XSLT added.
The style sheet is almost as simple as possible to produce xsl-fo.
It produces a page (<fo:page-sequence><fo:flow><fo:block>) containing only
the
word "details" for each <sql:row>. There are no fancy XPath expressions that
could
be responsable for stopping it streaming. It takes longer to start producing
output
than a) took to complete. Apparently it doesn't stream.
I've updated to xalan-2.2.0.jar and set "incremental-processing" to "true"
in cocoon.xconf in an effort to make it stream.

The last column, c) shows the same thing as b) but with fop pdf
serialization added.
It takes 11 minutes to start outputting then only another 1 1/2 minutes to
complete
writing 32Mb of pdf. The mail archives suggest that fop can only do output
when it
gets to a </fo:page-sequence>. That's why I produced a separate
<fo:page-sequence>
for each row in b), but it still doesn't stream.
See: http://marc.theaimsgroup.com/?l=fop-user&m=101405590416285&w=2

As for d) displaying the data in the browser as it arrives, it seems there
is
a problem here too. fop produces "non-optimized" pdf. For acrobat reader to
start rendering before all the data has arrived the pdf has to be
"optimized".
I believe this is on the fop "to do" list but not yet done.
See: http://marc.theaimsgroup.com/?l=fop-dev&m=99556319518135&w=2

Test conditions:
- 384M physical memory
- jdk 1.3.1-02 server JVM with: -server -Xmx356M

cocoon-2.0 mods:
- patched SQLTransformer.java for ResultSets returned from SQL Server stored
procedures
  See: http://marc.theaimsgroup.com/?t=101316904300002&r=1&w=2
- xalan-2.2.0.jar, fop-0.20.3rc.jar (instead of xalan-2.2.0-D13.jar,
fop-0.20.1-dev.jar)
- xerces config: incremental-processing="true" in cocoon.xconf (originally
false)
- janitor store config: freememory="20000000", heapsize="240000000" in
cocoon.xconf

I'm using a java programme instead of a browser so that I know when the data
starts arriving.


---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <co...@xml.apache.org>
For additional commands, e-mail: <co...@xml.apache.org>


Re: streaming large pdf reports

Posted by Berin Loritsch <bl...@apache.org>.
neil wrote:
> I'm trying to get large PDF reports streaming to the browser so that:
> 1. not too much memory is used (we can support multiple users doing reports)
> 2. we can support arbitrarily large reports
> 3. the user sees data dribbling onto their screen rather than being left to
> wonder
>    if anything is happening
> 
> (Please don't ask if anybody can really do anything useful with such big
> reports).
> 
> We'd like to get:
>   a) SQLTransformer to perform a query,
>   b) XSLT to produce fo,
>   c) fop to produce pdf and
>   d) the browser
> to all stream properly, but at the moment it looks like there are problems
> at b),
> c) and d). Evidence follows.


Actually, you should be able to get to B without issues.  Test this
theory with the XML serializer.

The problem is with the FOP Serializer (and similarly the SVG 
Serializer).  Both FOP and Batik are largely DOM bound.  Until FOP and 
Batik can build their documents in a stream, there is little that can
be done.

My advice is to generate these reports periodically instead of on demand.

> 
> I believe we should be able to get XSLT and fop streaming. Has anybody
> managed to?
> I'm using cocoon-2.0 with updates as described below. Does cocoon-2.0.1 have
> some
> other updates that are required to make this stream?
> 
> 
> 
> Test conditions are described at the end, here are the results:
> 
>                       a)query   b)fo      c)fop
> before start of o/p   0:00:10   0:02:40   0:11:21
> duration of o/p       0:02:20   0:00:40   0:01:38
> size of o/p          43179378  22849290  33346282
> 
> The first column, a) shows the production of xml by SQLTransformer calling a
> SQL Server stored procedure that returns 93000 rows. The file started
> growing
> after 10 secs and was complete after another 2 minutes 20 secs. So
> SQLTransformer
> by itself streams OK.
> 
> The 2nd column, b) shows the same thing as a) but with XSLT added.
> The style sheet is almost as simple as possible to produce xsl-fo.
> It produces a page (<fo:page-sequence><fo:flow><fo:block>) containing only
> the
> word "details" for each <sql:row>. There are no fancy XPath expressions that
> could
> be responsable for stopping it streaming. It takes longer to start producing
> output
> than a) took to complete. Apparently it doesn't stream.
> I've updated to xalan-2.2.0.jar and set "incremental-processing" to "true"
> in cocoon.xconf in an effort to make it stream.
> 
> The last column, c) shows the same thing as b) but with fop pdf
> serialization added.
> It takes 11 minutes to start outputting then only another 1 1/2 minutes to
> complete
> writing 32Mb of pdf. The mail archives suggest that fop can only do output
> when it
> gets to a </fo:page-sequence>. That's why I produced a separate
> <fo:page-sequence>
> for each row in b), but it still doesn't stream.
> See: http://marc.theaimsgroup.com/?l=fop-user&m=101405590416285&w=2
> 
> As for d) displaying the data in the browser as it arrives, it seems there
> is
> a problem here too. fop produces "non-optimized" pdf. For acrobat reader to
> start rendering before all the data has arrived the pdf has to be
> "optimized".
> I believe this is on the fop "to do" list but not yet done.
> See: http://marc.theaimsgroup.com/?l=fop-dev&m=99556319518135&w=2
> 
> Test conditions:
> - 384M physical memory
> - jdk 1.3.1-02 server JVM with: -server -Xmx356M
> 
> cocoon-2.0 mods:
> - patched SQLTransformer.java for ResultSets returned from SQL Server stored
> procedures
>   See: http://marc.theaimsgroup.com/?t=101316904300002&r=1&w=2
> - xalan-2.2.0.jar, fop-0.20.3rc.jar (instead of xalan-2.2.0-D13.jar,
> fop-0.20.1-dev.jar)
> - xerces config: incremental-processing="true" in cocoon.xconf (originally
> false)
> - janitor store config: freememory="20000000", heapsize="240000000" in
> cocoon.xconf
> 
> I'm using a java programme instead of a browser so that I know when the data
> starts arriving.
> 
> 
> ---------------------------------------------------------------------
> Please check that your question has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
> 
> To unsubscribe, e-mail: <co...@xml.apache.org>
> For additional commands, e-mail: <co...@xml.apache.org>
> 
> .
> 
> 



----------------------------------------------------
Sign Up for NetZero Platinum Today
Only $9.95 per month!
http://my.netzero.net/s/signup?r=platinum&refcd=PT97

---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <co...@xml.apache.org>
For additional commands, e-mail: <co...@xml.apache.org>