You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by mihaela olteanu <mi...@yahoo.com> on 2013/06/03 14:50:10 UTC

Merging a lot of small pdf documents (1/2 pages) into one pdf document

Hello,

I have a use case where I need to merge a large number of small pdf document (hundred of thousands) into one pdf document.
Currently I am using the method: org.apache.pdfbox.util.PDFMergerUtility.appendDocument(destination, source); for all the source documents, not directly mergeDocuments() method in the same class because I need to also add some bookmarks. Finally I save the document.

Is it a better way of doing this with a lower memory footprint? I tried importing each page from the source documents by using the method PDDocument.importPage() but still throws errors in version 1.8.2. 

When I call PDDocument.load(File) the whole document is loaded in memory? If so, it means that saving the generated pdf after merging a subset of documents and then reloading it would not decrease the memory use anyway ...

Could somebody point me to the right way of doing this?

Thanks,
Mihaela

Re: Merging a lot of small pdf documents (1/2 pages) into one pdf document

Posted by Gilad Denneboom <gi...@gmail.com>.
Try loading the file using a scratch file:
http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDDocument.html#load(java.lang.String,%20org.apache.pdfbox.io.RandomAccess)

 This will help lessen the memory load.


On Mon, Jun 3, 2013 at 2:50 PM, mihaela olteanu <mi...@yahoo.com>wrote:

> Hello,
>
> I have a use case where I need to merge a large number of small pdf
> document (hundred of thousands) into one pdf document.
> Currently I am using the
> method: org.apache.pdfbox.util.PDFMergerUtility.appendDocument(destination,
> source); for all the source documents, not directly mergeDocuments() method
> in the same class because I need to also add some bookmarks. Finally I save
> the document.
>
> Is it a better way of doing this with a lower memory footprint? I tried
> importing each page from the source documents by using the method
> PDDocument.importPage() but still throws errors in version 1.8.2.
>
> When I call PDDocument.load(File) the whole document is loaded in memory?
> If so, it means that saving the generated pdf after merging a subset of
> documents and then reloading it would not decrease the memory use anyway ...
>
> Could somebody point me to the right way of doing this?
>
> Thanks,
> Mihaela