You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Morris, Mark" <Ma...@experian.com> on 2015/06/03 18:12:01 UTC

Merge documents with streaming?

Hello! I’m generating multiple pdf documents using a commercial tool, then merging them into one document to deliver to the user. This all works, but I’m trying to reduce the memory footprint, and the current approach builds everything in memory.

So I wondered if this is possible: Create a document that streams to a file, then keep concatenating more pdf documents to the end, without needing to have the whole thing in memory at any time. I’ve looked through the API and didn’t see a way, but it’s new to me so I’m sure I could have missed something.

Thanks for any assistance!

Regards,
Mark


Re: Merge documents with streaming?

Posted by "Morris, Mark" <Ma...@experian.com>.
Thanks, that makes sense. So there is information in the earlier parts of the resulting file that would need to be updated as new pages are added to the tail, which is kind of the architectural problem I was worried about. If I were clever enough, I could write them out separately, update the resulting files to take the total end document into account, then stream them one after another as one result.

Not the answer I was hoping for, but thanks a bunch for everyone’s input. :-)

Regards,
Mark


> On Jun 5, 2015, at 10:33 AM, Brzrk One <br...@gmail.com> wrote:
> 
> Impossible?
> Nothing (:*mostly* nothing:) is impossible, it's just a matter of how much
> programming it will require!
> :)
> 
> I've used perl scripts to check/regenerate the xref table after I used a
> text editor to twiddle objects... on one level, the PDF file is just text...
> 
> In the simple case, one could write a script that iterated over the
> files, renumbering all the object ids, replacing the
> references, concatenating all the pieces, and building a combined xref
> table at the end. I think the output could be streamed, and the files would
> not need to be kept in memory.
> 
> Of course, there are those *non*-simple cases which seem to comprise the
> bulk of PDF examples...
> 
> On Fri, Jun 5, 2015 at 11:14 AM, Morris, Mark <Ma...@experian.com>
> wrote:
> 
>> Since I’ve gotten no response, is it safe to assume this is something
>> that’s architecturally impossible?
>> 
>> Thanks,
>> Mark
>> 
>>> On Jun 3, 2015, at 11:12 AM, Morris, Mark <Ma...@experian.com>
>> wrote:
>>> 
>>> Hello! I’m generating multiple pdf documents using a commercial tool,
>> then merging them into one document to deliver to the user. This all works,
>> but I’m trying to reduce the memory footprint, and the current approach
>> builds everything in memory.
>>> 
>>> So I wondered if this is possible: Create a document that streams to a
>> file, then keep concatenating more pdf documents to the end, without
>> needing to have the whole thing in memory at any time. I’ve looked through
>> the API and didn’t see a way, but it’s new to me so I’m sure I could have
>> missed something.
>>> 
>>> Thanks for any assistance!
>>> 
>>> Regards,
>>> Mark
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Merge documents with streaming?

Posted by Brzrk One <br...@gmail.com>.
Impossible?
Nothing (:*mostly* nothing:) is impossible, it's just a matter of how much
programming it will require!
:)

I've used perl scripts to check/regenerate the xref table after I used a
text editor to twiddle objects... on one level, the PDF file is just text...

In the simple case, one could write a script that iterated over the
files, renumbering all the object ids, replacing the
references, concatenating all the pieces, and building a combined xref
table at the end. I think the output could be streamed, and the files would
not need to be kept in memory.

Of course, there are those *non*-simple cases which seem to comprise the
bulk of PDF examples...

On Fri, Jun 5, 2015 at 11:14 AM, Morris, Mark <Ma...@experian.com>
wrote:

> Since I’ve gotten no response, is it safe to assume this is something
> that’s architecturally impossible?
>
> Thanks,
> Mark
>
> > On Jun 3, 2015, at 11:12 AM, Morris, Mark <Ma...@experian.com>
> wrote:
> >
> > Hello! I’m generating multiple pdf documents using a commercial tool,
> then merging them into one document to deliver to the user. This all works,
> but I’m trying to reduce the memory footprint, and the current approach
> builds everything in memory.
> >
> > So I wondered if this is possible: Create a document that streams to a
> file, then keep concatenating more pdf documents to the end, without
> needing to have the whole thing in memory at any time. I’ve looked through
> the API and didn’t see a way, but it’s new to me so I’m sure I could have
> missed something.
> >
> > Thanks for any assistance!
> >
> > Regards,
> > Mark
>
>

Re: Merge documents with streaming?

Posted by Tilman Hausherr <TH...@t-online.de>.
One could reduce the memory footprint by using scratch files when 
loading a PDF, but the PDFMergerUtility class doesn't support it.

Tilman

Am 05.06.2015 um 17:14 schrieb Morris, Mark:
> Since I’ve gotten no response, is it safe to assume this is something that’s architecturally impossible?
>
> Thanks,
> Mark
>
>> On Jun 3, 2015, at 11:12 AM, Morris, Mark <Ma...@experian.com> wrote:
>>
>> Hello! I’m generating multiple pdf documents using a commercial tool, then merging them into one document to deliver to the user. This all works, but I’m trying to reduce the memory footprint, and the current approach builds everything in memory.
>>
>> So I wondered if this is possible: Create a document that streams to a file, then keep concatenating more pdf documents to the end, without needing to have the whole thing in memory at any time. I’ve looked through the API and didn’t see a way, but it’s new to me so I’m sure I could have missed something.
>>
>> Thanks for any assistance!
>>
>> Regards,
>> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Merge documents with streaming?

Posted by Jesse Long <je...@gmail.com>.
Hi Mark,

In the latest 2.0.0 snapshot you should be able to create a new 
PDDocument using scratch files.

new PDDocument(true);

Then append to it as per normal. Streams for the new document will be 
stored in scratch files on the file system, not in memory.

Cheers,
Jesse

On 05/06/2015 17:14, Morris, Mark wrote:
> Since I’ve gotten no response, is it safe to assume this is something that’s architecturally impossible?
>
> Thanks,
> Mark
>
>> On Jun 3, 2015, at 11:12 AM, Morris, Mark <Ma...@experian.com> wrote:
>>
>> Hello! I’m generating multiple pdf documents using a commercial tool, then merging them into one document to deliver to the user. This all works, but I’m trying to reduce the memory footprint, and the current approach builds everything in memory.
>>
>> So I wondered if this is possible: Create a document that streams to a file, then keep concatenating more pdf documents to the end, without needing to have the whole thing in memory at any time. I’ve looked through the API and didn’t see a way, but it’s new to me so I’m sure I could have missed something.
>>
>> Thanks for any assistance!
>>
>> Regards,
>> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Merge documents with streaming?

Posted by "Morris, Mark" <Ma...@experian.com>.
Since I’ve gotten no response, is it safe to assume this is something that’s architecturally impossible?

Thanks,
Mark

> On Jun 3, 2015, at 11:12 AM, Morris, Mark <Ma...@experian.com> wrote:
> 
> Hello! I’m generating multiple pdf documents using a commercial tool, then merging them into one document to deliver to the user. This all works, but I’m trying to reduce the memory footprint, and the current approach builds everything in memory.
> 
> So I wondered if this is possible: Create a document that streams to a file, then keep concatenating more pdf documents to the end, without needing to have the whole thing in memory at any time. I’ve looked through the API and didn’t see a way, but it’s new to me so I’m sure I could have missed something.
> 
> Thanks for any assistance!
> 
> Regards,
> Mark