You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2016/06/11 19:30:21 UTC

[jira] [Commented] (PDFBOX-3380) Small change to PDFSplit loop reduces memory consuption

    [ https://issues.apache.org/jira/browse/PDFBOX-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326018#comment-15326018 ] 

Tilman Hausherr commented on PDFBOX-3380:
-----------------------------------------

It's true re: clone... the splitter uses importPage(), which I recently "improved" that it makes a cloned copy, see PDFBOX-3280 and PDFBOX-3328. One problem of the new behavior would be that splitting would now multiply identical resources, I hadn't thought about that. Sigh... 

> Small change to PDFSplit loop reduces memory consuption
> -------------------------------------------------------
>
>                 Key: PDFBOX-3380
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3380
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 2.0.2
>            Reporter: Justin Lee
>            Priority: Minor
>              Labels: patch, performance
>         Attachments: splitter.patch
>
>
> I was trying to use PDFSplit to split a large scanned document into single pages.  It very quickly ran out of memory.  I poked around in the code, and it looks to me like the issue is that the splitter code tries to create an in-memory model of every single cloned page before writing them to disk.  I created a patch based off of 2.0.2 that fixes my immediate problem in case it is helpful to anybody.  All it really does is move the outer processing loop to PDFSplit so it can write to disk after each page.  This probably isn't an ideal fix, but I'm not familiar with the internals of PDFBox to do much more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org