You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Hamann, Daniel" <D....@aurenz.de> on 2016/09/19 15:11:23 UTC

PDFMergerUtility causes an OutOfMemory exception when merging a large number of single page PDF documents

Hi,

 

Apache PDFBox 1.8.1 PDFMergerUtility causes an OutOfMemory exception
when merging a large number of single page PDF documents:

 

Here is the stacktrace:

 

Caused by: java.lang.OutOfMemoryError: Java heap space

       at java.util.Arrays.copyOf(Arrays.java:3210)

       at java.util.Arrays.copyOf(Arrays.java:3181)

       at java.util.ArrayList.grow(ArrayList.java:261)

       at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)

       at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)

       at java.util.ArrayList.add(ArrayList.java:458)

       at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:217)

       at
org.apache.pdfbox.pdmodel.PDPageNode.getKids(PDPageNode.java:174)

       at
org.apache.pdfbox.pdmodel.PDDocument.addPage(PDDocument.java:278)

       at
org.apache.pdfbox.util.PDFMergerUtility.appendDocument(PDFMergerUtility.
java:528)

       at
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.
java:242)

       at
org.apache.pdfbox.util.PDFMergerUtility.mergeDocumentsNonSeq(PDFMergerUt
ility.java:211)

 

 

Even when using mergeDocumentsNonSeq() - which means an external file is
used to store data that is read from source PDF documents temporarily -
memory is consumed during appending source documents to resulting merged
PDF.

 

My questions are:

 

1.            Is memory consumed because appendDocument() reads PDF
document information from temporary file back to memory...

2.            ...or is memory consumed because data structures are built
up in memory just to hold references to PDF document information in
temporary file (...which in turn is only read during streaming merged
document to file)?

3.            Can I expect version 2.0.3 to handle merging of PDFs
differently?

 

I checked the code of PDFMergerUtility in version 2.0.3 and I am aware
of the new "MemoryUsageSetting" method parameter. As far as I understand
method PDFMergerUtility.appendDocument() there is no significant
difference between version 1.8.10 and version 2.0.3.

 

Reading the code of PDFMergerUtility merging PDF documents seems to be
an extremely "expensive" process. I wonder if there really isn't a way
to do this using less memory...

 

Any answer would be greatly appreciated!

 

Thanks a lot,

 

Daniel

 


Re: PDFMergerUtility causes an OutOfMemory exception when merging a large number of single page PDF documents

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 19.09.2016 um 17:11 schrieb Hamann, Daniel:
> Hi,
>
>   
>
> Apache PDFBox 1.8.1 PDFMergerUtility causes an OutOfMemory exception
> when merging a large number of single page PDF documents:

That version is several years old...

>
>   
>
> Here is the stacktrace:
>
>   
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>         at java.util.Arrays.copyOf(Arrays.java:3210)
>
>         at java.util.Arrays.copyOf(Arrays.java:3181)
>
>         at java.util.ArrayList.grow(ArrayList.java:261)
>
>         at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
>
>         at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
>
>         at java.util.ArrayList.add(ArrayList.java:458)
>
>         at
> org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:217)
>
>         at
> org.apache.pdfbox.pdmodel.PDPageNode.getKids(PDPageNode.java:174)
>
>         at
> org.apache.pdfbox.pdmodel.PDDocument.addPage(PDDocument.java:278)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.appendDocument(PDFMergerUtility.
> java:528)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.
> java:242)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocumentsNonSeq(PDFMergerUt
> ility.java:211)
>
>   
>
>   
>
> Even when using mergeDocumentsNonSeq() - which means an external file is
> used to store data that is read from source PDF documents temporarily -
> memory is consumed during appending source documents to resulting merged
> PDF.
>
>   
>
> My questions are:
>
>   
>
> 1.            Is memory consumed because appendDocument() reads PDF
> document information from temporary file back to memory...

>
> 2.            ...or is memory consumed because data structures are built
> up in memory just to hold references to PDF document information in
> temporary file (...which in turn is only read during streaming merged
> document to file)?


>
> 3.            Can I expect version 2.0.3 to handle merging of PDFs
> differently?
>
>   
>
> I checked the code of PDFMergerUtility in version 2.0.3 and I am aware
> of the new "MemoryUsageSetting" method parameter. As far as I understand
> method PDFMergerUtility.appendDocument() there is no significant
> difference between version 1.8.10 and version 2.0.3.

The difference is under the hood, the memory management was changed 
between 1.8 and 2.0. So I'd suggest you just try.

Tilman


>
>   
>
> Reading the code of PDFMergerUtility merging PDF documents seems to be
> an extremely "expensive" process. I wonder if there really isn't a way
> to do this using less memory...
>
>   
>
> Any answer would be greatly appreciated!
>
>   
>
> Thanks a lot,
>
>   
>
> Daniel
>
>   
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org