You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Jean-François El Fouly <je...@elfouly.fr> on 2008/05/08 11:38:49 UTC

Re: Memory issue

Andreas Delmelle a écrit :
>
> Which Java VM are you using? Practically every time someone tells us 
> about memory/GC issues, it appears they are using an implementation 
> other than Sun (IBM, GNU...)
> Up to now, we still have to find out why precisely non-Sun VMs have 
> difficulties with FOP...
>
Nope. I'll double check but I'm pretty sure it's a genuine Sun JVM 
1.5.0_11, or maybe the very minor build after.
> How large would the resulting FO-files be if you dump them to the 
> filesystem? The XML by itself says very little. From a 1.5MB XML, you 
> could get a FO of a few KB or one of 26MB, depending on the stylesheet.
>
5.08 Mb.
> Does the stylesheet adhere to XSLT best practices? Does it generate a 
> lot of redundant fo:blocks, fo:inlines?
>
I hope not. It has been a complicated thing generated by StyleVision in 
the very beginning but it has been simplified and tweaked a lot.
>
> A nit, for the record: There is no such thing as 'forcing garbage 
> collection'. The most you can do with System.gc() is indicate to the 
> VM that it should run the GC as soon as possible. Admitted, most 
> implementations do run the algorithm virtually immediately upon 
> execution of the statement, but the Java spec does not mandate such 
> behavior. In theory, if the VM is too busy, it could still postpone 
> the actual GC-run, until it acquires the necessary resources...
>
Indeed, but the log4j log has timestamps and they show that 20 seconds 
are spent around System.gc() so my guess is that something really 
happens at that time.


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Memory issue

Posted by Andreas Delmelle <an...@telenet.be>.
On May 8, 2008, at 12:03, Andreas Delmelle wrote:

Hi Jean-François,

> On May 8, 2008, at 12:57, Jean-François El Fouly wrote:
>>
>> Andreas Delmelle a écrit :
>>> OK. Just curious: Any chance you could test it on another build  
>>> or maybe even Java 6?
>>>
>> Probably, if required or useful. Our sys admins are very  
>> cooperative ;-)
>>>


For the moment, that would be more a nice-to-know. Chances are that,  
if it's not JVM-related, this won't help a thing, so no need to go  
out of your way to do that

<snip />
> Yes. That is exactly what happened to the stylesheet we use. I've  
> reduced it drastically.
> One issue with stylesheets generated by StyleVision is that you  
> must be careful when you tweak them to avoid certain [fo-block  
> inside fo:inline] combinations that make FOP crash with a stack  
> trace and no really useful information about what's happening or  
> where. This bug is mentioned in the FOP bug tracker, though in a  
> rather raw, loose manner. I removed all such constructs and that  
> made the XSLT much simpler and cleaner.
>>

OK, so we can exclude that as well.

<snip />
>> AFAIU, this gives little opportunity for the XSLT processor to  
>> clean up anything. Java 1.5 uses Xalan XSLTC by default, which  
>> converts templates into Java objects. One giant template would  
>> then mean one very long-living object that may reference numerous  
>> others for the whole duration of the processing run. If you look  
>> at the chain, when using XML+XSLT input, FOP is always the first  
>> one to finish, then the XSLT processor, then the XML parser.
>> If the XSLT processor cannot reclaim anything, this will give FOP  
>> less room to work with, so it ultimately runs slower. As the heap  
>> increases to reach the maximum, the points where the JVM will  
>> launch the GC by itself, will also increase. Since it cannot  
>> expand the heap anymore, it will try to clean up more frequently.

> Yep, that is why I've tried to be cautious not to accuse FOP  
> publicly ;-)

... which is also why /we/ are so cooperative/responsive. ;-)

BTW: If all users would have the time and motivation to be as  
thorough as yourself, the traffic on this list would probably drop  
significantly.

> The problem is in the (Xalan + FOP) subsystem and the profiling  
> could well show that the issue is Xalan-related.

Or maybe even Xerces...? Xerces is a very feature-complete parser,  
but reports in the past have shown that all those nice features come  
with a price-tag. For FOP this holds as well, of course, and to be  
honest, FOP can be a pretty memory-hungry beast if you're not careful  
(but you definitely seem to be).

A relatively easy way to find out whether it's XSLT-related, would be  
to try out Saxon instead. I don't know if you have any experience  
with plugging in a different XSLT processor, but this is pretty  
straightforward (but might require re-starting the JBoss service,  
depending on how you go about it; for testing purposes, you could  
ultimately also change the app-code to reference Saxon directly  
instead of letting the JVM choose the  
javax.xml.transform.TransformerFactory implementation, and then  
redeploy).



Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Memory issue

Posted by Jean-François El Fouly <je...@elfouly.fr>.
Andreas Delmelle a écrit :
> OK. Just curious: Any chance you could test it on another build or 
> maybe even Java 6?
>
Probably, if required or useful. Our sys admins are very cooperative ;-)
>
> In my personal experience, optimizing the stylesheet code usually does 
> not offer much improvement in terms of global memory usage, but it 
> could have a noticeable impact on the processing time. One of the 
> things I've learned about generated XSL-FO stylesheets by Altova is 
> that they add a lot of fo:inlines to specify, for example, 
> font-properties on the lowest levels in the generated FO while, when 
> comparing to the font-properties of the fo:inlines' parents nothing 
> really changes, except for the size, style or weight. From FOP's point 
> of view, that's somewhat of a waste. Much better to specify a global 
> font-size on the page-sequence, and override on the lower levels only 
> what is really necessary. After adapting the stylesheet manually, and 
> removing the redundant fo:inlines, the stylesheet and the generated FO 
> were reduced to not even half the original size.
>
Yes. That is exactly what happened to the stylesheet we use. I've 
reduced it drastically.
One issue with stylesheets generated by StyleVision is that you must be 
careful when you tweak them to avoid certain [fo-block inside fo:inline] 
combinations that make FOP crash with a stack trace and no really useful 
information about what's happening or where. This bug is mentioned in 
the FOP bug tracker, though in a rather raw, loose manner. I removed all 
such constructs and that made the XSLT much simpler and cleaner.
> Something else that bothered me, but I don't know if that was also 
> generated by Altova, is that in one of the stylesheets I saw, the 
> entire transformation was contained in one giant template...
With the last version, or our XSLT ? this was no longer the case.
> AFAIU, this gives little opportunity for the XSLT processor to clean 
> up anything. Java 1.5 uses Xalan XSLTC by default, which converts 
> templates into Java objects. One giant template would then mean one 
> very long-living object that may reference numerous others for the 
> whole duration of the processing run. If you look at the chain, when 
> using XML+XSLT input, FOP is always the first one to finish, then the 
> XSLT processor, then the XML parser.
> If the XSLT processor cannot reclaim anything, this will give FOP less 
> room to work with, so it ultimately runs slower. As the heap increases 
> to reach the maximum, the points where the JVM will launch the GC by 
> itself, will also increase. Since it cannot expand the heap anymore, 
> it will try to clean up more frequently.
Yep, that is why I've tried to be cautious not to accuse FOP publicly ;-)
The problem is in the (Xalan + FOP) subsystem and the profiling could 
well show that the issue is Xalan-related.
BTW, we've made the Xalan-FOP coupling a parameter so that we can use 
tight coupling (with Sax events) or loose coupling (writing the 
intermediate FO files on disk). We usually use the second option, since 
the possibility to read the FO intermediate code is helpful when you 
debug. And I guess without being really sure that not to have Xalan and 
FOP working at the same time should use less memory. This separation 
probably accounts for the long execution time, but that is not an issue 
since document generation does not occur often in the target system (you 
can generate chapters for proofreading but you generate the whole 
document once-twice a day).


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Memory issue

Posted by Andreas Delmelle <an...@telenet.be>.
On May 8, 2008, at 11:38, Jean-François El Fouly wrote:
> Andreas Delmelle a écrit :
>>
>> Which Java VM are you using? Practically every time someone tells  
>> us about memory/GC issues, it appears they are using an  
>> implementation other than Sun (IBM, GNU...)
>> Up to now, we still have to find out why precisely non-Sun VMs  
>> have difficulties with FOP...
>>
> Nope. I'll double check but I'm pretty sure it's a genuine Sun JVM  
> 1.5.0_11, or maybe the very minor build after.

OK. Just curious: Any chance you could test it on another build or  
maybe even Java 6?

>> How large would the resulting FO-files be if you dump them to the  
>> filesystem? The XML by itself says very little. From a 1.5MB XML,  
>> you could get a FO of a few KB or one of 26MB, depending on the  
>> stylesheet.
>>
> 5.08 Mb.

That's not what I would call a large FO, so this should be no problem.

>> Does the stylesheet adhere to XSLT best practices? Does it  
>> generate a lot of redundant fo:blocks, fo:inlines?
>>
> I hope not. It has been a complicated thing generated by  
> StyleVision in the very beginning but it has been simplified and  
> tweaked a lot.

In my personal experience, optimizing the stylesheet code usually  
does not offer much improvement in terms of global memory usage, but  
it could have a noticeable impact on the processing time. One of the  
things I've learned about generated XSL-FO stylesheets by Altova is  
that they add a lot of fo:inlines to specify, for example, font- 
properties on the lowest levels in the generated FO while, when  
comparing to the font-properties of the fo:inlines' parents nothing  
really changes, except for the size, style or weight. From FOP's  
point of view, that's somewhat of a waste. Much better to specify a  
global font-size on the page-sequence, and override on the lower  
levels only what is really necessary. After adapting the stylesheet  
manually, and removing the redundant fo:inlines, the stylesheet and  
the generated FO were reduced to not even half the original size.

Something else that bothered me, but I don't know if that was also  
generated by Altova, is that in one of the stylesheets I saw, the  
entire transformation was contained in one giant template... AFAIU,  
this gives little opportunity for the XSLT processor to clean up  
anything. Java 1.5 uses Xalan XSLTC by default, which converts  
templates into Java objects. One giant template would then mean one  
very long-living object that may reference numerous others for the  
whole duration of the processing run. If you look at the chain, when  
using XML+XSLT input, FOP is always the first one to finish, then the  
XSLT processor, then the XML parser.
If the XSLT processor cannot reclaim anything, this will give FOP  
less room to work with, so it ultimately runs slower. As the heap  
increases to reach the maximum, the points where the JVM will launch  
the GC by itself, will also increase. Since it cannot expand the heap  
anymore, it will try to clean up more frequently.


Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org