You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/01/27 15:26:45 UTC

[jira] Commented: (PDFBOX-899) OutOfMemoryError with PDFTextStripper

    [ https://issues.apache.org/jira/browse/PDFBOX-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987581#action_12987581 ] 

Andreas Lehmkühler commented on PDFBOX-899:
-------------------------------------------

The extraction works fine with the current trunk version (rev. 1063402) without applying the patch. Probably my recent changes on the font stuff accidentily eliminated a memory leak and/or improved the memory consumption. Can you confirm that behaviour?

P.S.: According to the document properties it isn't allowed to extract the text .... ;-)

> OutOfMemoryError with PDFTextStripper
> -------------------------------------
>
>                 Key: PDFBOX-899
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-899
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.3.1
>         Environment: java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode)
>            Reporter: Alexander Veit
>            Priority: Critical
>         Attachments: PDFBOX-899.patch
>
>
> PDFBox 1.3.1 has high memory demands when stripping text from PDF files.
> http://www.unicode.org/Public/5.1.0/charts/CodeCharts.pdf even crashes an application server by requiring esimated aditional 300MB+ of heap memory. The heap dump suggests that PDFStreamEngine#documentFontCache might be the root of the leaking objects.
> PDFBox 1.0.0 did not show this behaviour. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.