You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2011/05/18 14:45:24 UTC

DO NOT REPLY [Bug 51218] New: FOP is unable to create PDF if there is an unusually large paragraph.

https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

             Bug #: 51218
           Summary: FOP is unable to create PDF if there is an unusually
                    large paragraph.
           Product: Fop
           Version: 0.20.1
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: general
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: abhijeet.iitr@gmail.com
    Classification: Unclassified


Hi,

We have two problems :

1. FOP Performs unusually SLOW if there is a large paragraph
We have noticed that when there is an unusually large paragraph than FOP
performance is incredibly slow. FOP takes more than 15 minutes in the method
findBreakingPoints which is defined in BreakingAlgorithm.java. The paragraph
size is of around 50 thousand characters. This method seems to find the best
possible Break point. Can we not make this method return a default break point
that works for the English language ?

2. FOP uses unusually large memory when running in findBreakingPoints method
defined in BreakingAlgorithm.java. This method starts to consume around 500 MB
memory creating thousands of Objects of KnuthNode type. Such memory consumption
is unacceptable just for finding a line break :-(.

2. FOP gives a SAX Exception on having a long paragraph in Systems which dont
have 1.5 GB RAM for a simple paragraph which has 90K Characters. Below is the
exception 
javax.xml.transform.TransformerConfigurationException:
javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The
element type "xsl:template" must be terminated by the matching end-tag
"</xsl:template>".
                at
org.apache.xalan.processor.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:995)
                at
com.ca.calm.reporter.pdf.PDFGenerator.buildPdf(PDFGenerator.java:1271) 
Caused by: javax.xml.transform.TransformerException:
org.xml.sax.SAXParseException: The element type "xsl:template" must be
terminated by the matching end-tag "</xsl:template>".
                at
org.apache.xalan.processor.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:991)
                ... 6 more
Caused by: org.xml.sax.SAXParseException: The element type "xsl:template" must
be terminated by the matching end-tag "</xsl:template>".
                at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
Source)
                at
org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
                at


Is there a way, I can prevent this extensive memory usage and slow performance
by using a default break ? I am ready to build the JAR myself. Is this a bug
which has already been fixed ?

thanks,
Jeet.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #8 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:42:57 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #9 from Glenn Adams <gl...@skynav.com> 2012-04-08 05:17:27 UTC ---
please provide minimal input FO test file, output PDF file(s), and full console
output that demonstrates problem

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #4 from Abhijeet <ab...@gmail.com> 2011-05-19 13:50:47 UTC ---
Hi Andreas, 

This data is coming as a part of the feed so I have no direct control over it.
Most of the times we do get only small paragraphs in it.

1. Your suggestion of adding (linefeed-treatment="preserve") somehow didn't
have any visible affect. I am putting the XSL below. However, I did see that
this reduced the memory usage which doesn't grow exponentially anymore. I added
tag to the block and other places too :-(

2. Interestingly it looks that a default line break is coming in all the feeds
that have large content. eg. 

Interactive&lt;br&gt;Group connections222 :&lt;br&gt;

Can I use the custom BR tag to improve performance and reduce memory foot print
?

A writeup is mentioned at
http://www.stylusstudio.com/xsllist/200312/post00590.html

3. Is there any other way to optimize performance findBreakingPoints by
compromising formatting ? Like you mentioned a paragraph of larger than 50
thousand characters in difficult to read anyways. All I want is that the
paragraph gets printed even if ill formatted. 

Thanks in advance.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #5 from Andreas L. Delmelle <ad...@apache.org> 2011-05-19 18:07:16 UTC ---
(In reply to comment #4)
> 
> 1. Your suggestion of adding (linefeed-treatment="preserve") somehow didn't
> have any visible affect. I am putting the XSL below. However, I did see that
> this reduced the memory usage which doesn't grow exponentially anymore. I added
> tag to the block and other places too :-(

By 'no visible effect', do you mean in the output?

> 2. Interestingly it looks that a default line break is coming in all the feeds
> that have large content. eg. 
> 
> Interactive&lt;br&gt;Group connections222 :&lt;br&gt;
> 
> Can I use the custom BR tag to improve performance and reduce memory foot print
> ?

Definitely. That gives the line-layout algorithm hints about where a break MUST
occur, and that allows at least some optimization, however... (see below)

> A writeup is mentioned at
> http://www.stylusstudio.com/xsllist/200312/post00590.html

... In cases where you are sure that the <br/> just appears in between plain
text, it is more optimal to transform it into a literal linefeed character
(U+000A), and set linefeed-treatment on the parent block.
That will reduce memory usage even further. Empty blocks generate enough
overhead to justify avoiding too many of those.

> > 3. Is there any other way to optimize performance findBreakingPoints by
> compromising formatting ? Like you mentioned a paragraph of larger than 50
> thousand characters in difficult to read anyways. All I want is that the
> paragraph gets printed even if ill formatted.

See above: I think that, apart from injecting forced breaks in the input, I do
not immediately see a way to optimize further.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #11 from Luis Bernardo <lm...@gmail.com> 2012-04-24 08:40:44 UTC ---
this is an outofmemory issue so what does not work in one machine may work in
some other. you only see the issue if your machine runs out of memory in which
case you see the usual outofmemory error/exception. when that happens there is
no output PDF.

any very long paragraph of non self repeating content should work as an
example. self repeating a short sentence to create a long paragraph is not a
good example since the line breaking algorithm may break lines always in the
same place.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Glenn Adams <ga...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #15 from Glenn Adams <ga...@apache.org> ---
batch transition resolved+wontfix to closed+wontfix; if you believe this
remains a bug and can demonstrate it with appropriate input FO file and output
PDF file (as applicable), then you may reopen

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Abhijeet <ab...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|0.20.1                      |1.0

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #3 from Andreas L. Delmelle <ad...@apache.org> 2011-05-18 18:45:18 UTC ---
(In reply to comment #0)
> 
> 1. FOP Performs unusually SLOW if there is a large paragraph
> We have noticed that when there is an unusually large paragraph than FOP
> performance is incredibly slow. FOP takes more than 15 minutes in the method
> findBreakingPoints which is defined in BreakingAlgorithm.java. The paragraph
> size is of around 50 thousand characters. This method seems to find the best
> possible Break point. Can we not make this method return a default break point
> that works for the English language ?

Sorry, I do not understand the point you are trying to make here. 
What is 'unusual' is having a paragraph with 50K chars in the first place, and
then expecting an advanced layout algorithm to just process this as fast as one
with only 500 chars.

Admittedly, FOP has a scalability problem here, but 50K chars? Really? Who is
supposed to read that?

Could it be that the paragraph is pre-formatted, perhaps? That is: Does it
contain linefeeds that may be preserved? In that case, specify
linefeed-treatment="preserve" on the surrounding block and it will go
significantly faster.

> 
> 2. FOP uses unusually large memory when running in findBreakingPoints method
> defined in BreakingAlgorithm.java. This method starts to consume around 500 MB
> memory creating thousands of Objects of KnuthNode type. Such memory consumption
> is unacceptable just for finding a line break :-(.

It is not 'finding a line-break'. It determines the most optimal line-breakS
(plural).

And again: we know of this issue, but fixing it is non-trivial, unfortunately.

> Is there a way, I can prevent this extensive memory usage and slow performance
> by using a default break ?

You can, as suggested above, make sure the linefeeds are preserved, if that is
what you mean by 'default break'. Each line will then become a sub-paragraph,
and the complete paragraph will take only a fraction of the time and memory to
process.

The only other option is to split that monster-paragraph into smaller ones.
Divide and conquer.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |normal

--- Comment #7 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:37:07 UTC ---
resetting severity from major to normal pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #14 from Glenn Adams <ga...@apache.org> ---
batch transition resolved+wontfix to closed+wontfix

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #1 from Mehdi Houshmand <me...@gmail.com> 2011-05-18 13:02:45 UTC ---
Have you tried using FOP 1.0? 0.20 is no longer supported.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

taffy-tyler6464@hotmail.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |taffy-tyler6464@hotmail.co.
                   |                            |uk

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

Glenn Adams <ga...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|                            |WONTFIX

--- Comment #13 from Glenn Adams <ga...@apache.org> 2012-04-24 13:19:50 UTC ---
this is a resource (memory) issue, not a bug; if someone wishes to post a patch
that redesigns the line breaker to operate more efficiently, then it will be
given serious consideration;

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #6 from Deepthi Bakkavemana <de...@gmail.com> 2011-06-07 12:06:22 UTC ---
Hi,

        I haver tried with "." as delimiter and I dont find any change or any
improvement while exporting to pdf.Please comment if you have any suggestions

Thanks,
Deepthi

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #2 from Abhijeet <ab...@gmail.com> 2011-05-18 13:06:27 UTC ---
Corrected the version. It is 1.0.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #10 from Glenn Adams <ga...@apache.org> 2012-04-24 05:42:25 UTC ---
(In reply to comment #9)
> please provide minimal input FO test file, output PDF file(s), and full console
> output that demonstrates problem

Abhijeet, I am still awaiting your input as requested above. if I see no
further input by April 30, I will close this bug due to lack of requested
information. Regards, Glenn

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51218] FOP is unable to create PDF if there is an unusually large paragraph.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51218

--- Comment #12 from Luis Bernardo <lm...@gmail.com> 2012-04-24 08:45:26 UTC ---
Created attachment 28665
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28665
example

repeat the content inside the block (it is long enough to not trigger a simple
line breaking solution) enough times until you see the problem.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.