You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Ch...@scotland.gsi.gov.uk on 2002/05/10 13:32:44 UTC

FOP Scalability

*****************************************************************************
This email and any files transmitted with it are intended solely
 for the use of the individual or entity to whom they are addressed.
*****************************************************************************

I have FOP 0.20.3 running successfully on my web server and was wondering
what the best way to control the scalability of it. There is a possibility
that 30 + users may be requesting a PDF page at the same time and as you all
know FOP is not too kind on memory usage. The PDF being produced are only 10
pages max so its just the cost of the processing on the web server that is
the problem. 
I have seen ideas about limiting the number of FOP processes running at one
time but I dont want to have users waiting for minutes for their PDF page to
be displayed. 
Does any one have any other ideas ?

Re: FOP Scalability

Posted by Matt Savino <ma...@synergizethis.com>.
I'm working on this problem as we speak. I'm experimenting with wrapping
the FOP engine in a stateless session bean so I can limit the number of
max-free-beans in the pool. (I'm using Weblogic, I'm not sure what
facilities other app servers have for clustering/pooling stateless
session beans.) Another approach is to keep the generator in a servlet
and limit the number of threads. If I can get it to work, I like the EJB
way better because it gives me more flexibility for the future if I want
to offload the whole process to another server, or possibly transport it
to a message-driven bean as has been mentioned in this thread.

What I was hoping was that if I limited the max number of beans to 1,
then any second request to access the generatePdf(...) method on the
bean would automatically jump to one of the other server instances (we
have 4 in production). Thus, distributed computing. My benchmarking
tests have shown that on our production servers, more than one report of
any length (I used 10 pages) running at once on the same instance
seriously degrades performance time. Interestingly my development box,
an NT4-PIII-933 w/500MB ram performs much better than our production
boxes - HP-UX 2x550 RISC 2GB ram. Also, the NT box doesn't really start
to degrade until you get 3 or 4 reports running concurrently. We're
actually considering as a future solution setting up a Win2K Weblogic
server to handle nothing but PDF generation. I can supply the
benchmarking results if anyone is interested.

Our FOP engine doesn't need to take large (byte-wise) arguments or
return a huge stream (w/o many and/or large images, I have found the PDF
byte-array is very compact - ~1-2k per page). Our PDF generator only
needs the stored proc call as an argument to get the data from the
dbase. It does the XSLT and FOP transformations itself. I believe the
small-in/small-out nature of this process, combined with the processing
bottleneck decribed above, makes this problem an ideal candidate for
distributed load-balancing. I also belive this ability is one of the
promises of EJBs, thus making this a case where using an EJB is actually
*not* overkill.

But alas it turns out that although Weblogic touts Stateless Session
bean load-balancing all over the place, you actually need two clusters
to get it to work--one running the servlet/JSP stuff and another for
EJBs. Which means either I cut into the number of servers available for
JSP/web serving, or I start adding more instances on the same boxes.
Anyone have any opinion on how much system resources a mostly idle
Weblogic 6 instance takes up? Futher complicating things is the fact
that we have very little real user profiling data yet, so I really don't
know how much PDF activity vs. other activity will be going on. All I
know for sure is that if we get 4 people on the same instance running a
PDF of any length, they all slow to a near stop.

Another interesting possibilty is that in Weblogic you can write a class
that routes stateless bean method calls based on the arguments in the
call. Conceivably one could use this to separate the large documents
from the small ones. I believe this ability is ultimately going to be
necessary in any kind of load-balancing scenario where clients may be
forced to wait for resources. People may understand waiting 5-10 minutes
for a 200 page report, but not for 2 pages. 

I'd love to hear any thoughts on these ideas or other possible/confirmed
solutions to this problem. I'll keep the board posted on my final
solution. 

-Matt




when a second Since the FOP engine needs very little 

> Chris.Brown@scotland.gsi.gov.uk wrote:
> 
> *****************************************************************************
> This email and any files transmitted with it are intended solely
> for the use of the individual or entity to whom they are addressed.
> *****************************************************************************
> 
> I have FOP 0.20.3 running successfully on my web server and was
> wondering what the best way to control the scalability of it. There is
> a possibility that 30 + users may be requesting a PDF page at the same
> time and as you all know FOP is not too kind on memory usage. The PDF
> being produced are only 10 pages max so its just the cost of the
> processing on the web server that is the problem.
> I have seen ideas about limiting the number of FOP processes running
> at one time but I dont want to have users waiting for minutes for
> their PDF page to be displayed.
> Does any one have any other ideas ?

Re: FOP Scalability

Posted by Jeremias Maerki <je...@outline.ch>.
You could have additional servers (can be low-cost) and use JMS to post
jobs to a queue. These additional servers will listen on the queue
(automatic load-balancing through use of JMS), generate PDFs and send a
message back that the file is ready to be taken by the front servlet.

> I have FOP 0.20.3 running successfully on my web server and was wondering
> what the best way to control the scalability of it. There is a possibility
> that 30 + users may be requesting a PDF page at the same time and as you all
> know FOP is not too kind on memory usage. The PDF being produced are only 10
> pages max so its just the cost of the processing on the web server that is
> the problem. 
> I have seen ideas about limiting the number of FOP processes running at one
> time but I dont want to have users waiting for minutes for their PDF page to
> be displayed. 
> Does any one have any other ideas ?

Cheers,
Jeremias Märki

mailto:jeremias.maerki@outline.ch

OUTLINE AG
Postfach 3954 - Rhynauerstr. 15 - CH-6002 Luzern
Tel. +41 41 317 2020 - Fax +41 41 317 2029
Internet http://www.outline.ch