You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2016/07/30 20:46:20 UTC

[jira] [Comment Edited] (PDFBOX-3442) OOM for single page pdf file

    [ https://issues.apache.org/jira/browse/PDFBOX-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400820#comment-15400820 ] 

John Hewson edited comment on PDFBOX-3442 at 7/30/16 8:45 PM:
--------------------------------------------------------------

The font cache only handles indirect objects as only object (id, gen) is consistent across pages. Direct objects are indeed unusual, and can't be cached across pages, but what we should probably be doing here is caching the direct object for the given page. Currently each call to getFont within even the same page is resulting in a new PDFont.

We could extend ResourceCache with methods to cache direct objects keyed by (type, name, page) and then have PDResources use that cache if the object is direct.


was (Author: jahewson):
The font cache only handles indirect objects as only object (id, gen) is consistent across pages. Direct objects are indeed unusual, and can't be cached across pages, but what we should probably be doing here is caching the direct object for the given page.

We could extend ResourceCache with methods to cache direct objects keyed by (type, name, page) and then have PDResources use that cache if the object is direct.

> OOM for single page pdf file
> ----------------------------
>
>                 Key: PDFBOX-3442
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3442
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>
> On TIKA-2045, a user posted a single page document that leads to OOM with -Xmx1g.  I confirmed this with PDFBox's ExtractText.
> Might be a memory leak with the fonts?  See [this|https://issues.apache.org/jira/browse/TIKA-2045?focusedCommentId=15399583&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399583] for some diagnostics I did.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org