You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by "Griffin,Sean" <SG...@CERNER.COM> on 2009/12/03 19:21:44 UTC
Large number of small blocks = OutOfMemory

FOP Developers,

I'm having an issue with OOM that is related to the redesigned FOP; 0.20.5 did not have this problem.  The issue is that the page breaking algorithm doesn't seem to scale very well when you have a very high number of blocks.  For example, with approximately 500MB max heap and 300 PDF pages where most lines on each page of that PDF are 1-line paragraphs, I run out of memory.

I did some searching and found one thread in particular that seems directly related to this: http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg10813.html.  Of course this was in response to a patch that hacked the page-sequence element to force the page breaking algorithm to recycle some memory more often, but the part that interests me is Andreas's response directly linked above.  That response seems to indicate that if I use "break-after='page'" on some of my blocks I should get the same behavior that the patch exhibits.  This does not appear to be the case.

Recreating is pretty simple.  Inside the <fo:flow> tag I put what the following code generates:

for (int i = 1; i <= 200; ++i) {
    out.write("<fo:block break-after=\"page\">");
    for (int j = 1; j <= 250; ++j) {
        out.write("<fo:block>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam aliquam.</fo:block>");
    }
    out.write("</fo:block>");
}

Admittedly this is a very large number of blocks but the real-world use case to create it is not that extraordinary: 200 documents with 250 lines in each document where each line is its own a paragraph.  I have noticed the same behavior whether I use the break-after attribute or not, and I would certainly expect it to finish without any problem at all if the break-after made a difference on the algorithm; 250 paragraphs isn't that many.

Is there something I'm missing or does the algorithm, indeed, store all Knuth sequences in memory for the entire page sequence even using hard page breaks within it?  FWIW, the main objects consuming all the memory are:

*	LeafPosition (12.8%)
*	Object[] (10.7%)
*	KnuthGlue (7.4%)
*	char[] (6.1%)
*	KnuthPenalty (5.3%)
*	TextLayoutManager$AreaInfo (5.1%)
*	KnuthInlineBox (5%)

Thank you ahead of time,

Sean Griffin | MSVC Architect | Cerner Corporation | 816.201.1599 | sgriffin@cerner.com | www.cerner.com <http://www.cerner.com/> 


----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.