You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Chris Bamford <cb...@mimecast.com> on 2015/11/04 13:37:57 UTC

A question about extracting files embedded in a PDF

Hi there,

Last time I tried this, I would get OOMs with PDFs containing lots of embedded files (or a few large images) because I believe it builds a sort of object model in memory holding all the data at once.
I'm hoping there might be an alternative approach available whereby files can be streamed out so the memory impact is low.

Is there a way to do this?

Thanks

- Chris


[ YouTube: http://www.youtube.com/user/mimecast#p/u/15/_523kC3lcNQ]  [ Twitter: http://twitter.com/mimecast ]  [ Our Blog: http://blog.mimecast.com/ ] 

Chris Bamford
Lead Software Engineer
c: +44 7860 405292
p: +44 207 847 8700
http://www.mimecast.com

Johannesburg Map 
GPS: 26' 05.940" S, 18o 28' 04.278" E
(http://maps.google.com/maps/ms?hl=en&ie=UTF8&msa=0&msid=104153695170153523925.000469102c74a808b138c≪=-26.099685,28.069403&spn=0.011986,0.026178&z=16)

Cape Town Map
GPS: 33o 56.068" S, 18o 28.320" E
(http://maps.google.com/maps/ms?source=s_q&hl=en≥ocode=&mrt=all&ie=UTF8&g=Fir+Street,+Observatory,Cape+Town&msa=0≪=-33.934753,18.4721&spn=0.00413,0.009656&z=17&msid=100887237870528382628.00046a80a3916c933dad3)

====================================================================================================================================================================

Disclaimer

This email, sent at 12:38:00 on 2015-11-04 from cbamford@mimecast.com to users@pdfbox.apache.org has been scanned for viruses and malware by Mimecast, an innovator in software as a service (SaaS) for business. Mimecast Services Ltd's email continuity, security, archiving and compliancy is managed by Mimecast's unified email management platform. 
To find out more, email info@mimecast.co.za or request a demo.

Mimecast SA (Pty) Ltd is a registered company within the Republic of South Africa, company registration number: 2004/000965/07  VAT No. 4650210547



Re: A question about extracting files embedded in a PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Try again with the 2.0 version, the scratch file mechanism has been 
improved (see the docs of the PDDocument.load() methods).

Tilman

Am 04.11.2015 um 13:37 schrieb Chris Bamford:
>
> Hi there,
>
> Last time I tried this, I would get OOMs with PDFs containing lots of 
> embedded files (or a few large images) because I believe it builds a 
> sort of object model in memory holding all the data at once.
> I'm hoping there might be an alternative approach available whereby 
> files can be streamed out so the memory impact is low.
>
> Is there a way to do this?
>
> Thanks
>
> - Chris
>
>
> Chris Bamford 	m: +44 7860 405292 	www.mimecast.com 
> <http://www.mimecast.com/>
> Lead Software Engineer 	p: +44 207 847 8700 	Address click here 
> <http://www.mimecast.com/About-us/Contact-us/>
> ------------------------------------------------------------------------
> http://www.mimecast.com/ 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=dbaac6139b74ac54fecf0aa298ba2e14> 
>
>
> 	
> 	
> LinkedIn 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=ea64783ac26551316398a5eef4da7f08> 
>
>
> 	
> YouTube 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=fd44eb88173e025bedba95fcfa329a2c> 
>
>
> 	
> Facebook 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=9c9f07a86ad6805889bb5d0a5fde52d4> 
>
>
> 	
> Blog 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=ad38dbaa0793914dff052857e1828766> 
>
>
> 	
> Twitter 
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=5a94c8d62b2e7531a76be62680ea5823> 
>
>
>
>
> <https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=93ef2209b5bb30f500aa671a0428753a> 
>
>
>
> *Disclaimer*
> The information contained in this communication from 
> *cbamford@mimecast.com * sent at 2015-11-04 12:38:00 is confidential 
> and may be legally privileged. It is intended solely for use by 
> *users@pdfbox.apache.org * and others authorized to receive it. If you 
> are not *users@pdfbox.apache.org * you are hereby notified that any 
> disclosure, copying, distribution or taking action in reliance of the 
> contents of this information is strictly prohibited and may be unlawful.
>
> This email message has been scanned for viruses by Mimecast. Mimecast 
> delivers a complete managed email solution from a single web based 
> platform. For more information please visit http://www.mimecast.com
>
>
>
>