You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Swapnil Raverkar <sw...@gmail.com> on 2015/03/27 08:49:58 UTC

PDFBox loading larger PDF

Hi,

While loading larger PDF files more than 200 MB for 10 concurrent users
with 3GB heap Space configured in JVM, we are getting following exception :

com.apple.ist.acss.pdf.encoding.exception.EncoderException:
org.apache.pdfbox.exceptions.WrappedIOException

at
com.apple.ist.acss.core.encoder.impl.WatermarksPDFEncoderImpl.getWatermarksEncodedPDF(SourceFile:76)

at
com.apple.plm.pdfwatermarking.util.PDFBinaryCodeEncoder.getEncodedContent(PDFBinaryCodeEncoder.java:89)

at
com.apple.plm.pdfwatermarking.util.PDFWatermarkingCommon.callWatermarkForFM(PDFWatermarkingCommon.java:471)

at
com.apple.plm.pdfwatermarking.PDFWatermarking.doFilter(PDFWatermarking.java:128)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)

at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)

at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)

at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(

ThreadPoolExecutor.java:603)

at java.lang.Thread.run

ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

Caused by: org.apache.pdfbox.exceptions.WrappedIOException

at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278)

at
com.apple.ist.acss.core.encoder.impl.WatermarksPDFEncoderImpl.getWatermarksEncodedPDF(SourceFile:51)

... 19 more

Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1

at java.util.ArrayList.rangeCheck(ArrayList.java:604)

at java.util.ArrayList.get(ArrayList.java:382)

at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)

at
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at java.io.FilterOutputStream.close(FilterOutputStream.java:157)

at
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:616)

at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:650)

at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)

... 20 more



Is there any way by which we can address this issue? i.e. instead of
loading entire PDF in memory, load it in chunks?

Any pointers on this would be really helpful.



Thanks,

Swapnil

Re: PDFBox loading larger PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

Sorry, no, we need the whole file.

You could try to use a scratch file in load() or (better) loadNonSeq.

I also see my last mail was a bit unclear
- with "merging a file with itself" I meant doing this until you get a 
huge non confidential file. Try e.g. with the PDF specification file.
- with "ab.exe" I meant "apache benchmark". It isn't part of the tomcat 
distribution, but one can extract it from some apache webserver 
distributions.

I assume you use the 1.8 version. In the unreleased 2.0 version there 
would be a trick (with a few lines of code change) to save memory by not 
keeping the images unless needed.

Tilman

Am 28.03.2015 um 08:14 schrieb Swapnil Raverkar:
> Hi Tilman,
>
> For PDFBox is there any way to load the file in chunks instead of loading
> entire file in memory for populating the document object? It is consuming
> around 1GB heap memory at start-up for a single file (200 MB) for a single
> user and drops down to 600-700 MB during the processing time.
>
> Thanks,
> Swapnil
>
> On 27 March 2015 at 23:44, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Errors that happen in concurrent situations are notoriously hard to find.
>> The best would be to prepare
>> - a large file of the kind you mention
>> - code of a servlet to be run in tomcat (smallest possible code that
>> brings the error)
>> then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune
>> the parameters until it happens for sure, then send us all that.
>>
>> If the file is confidential, try if the error happens by merging a non
>> confidential file with itself.
>>
>> Tilman
>>
>> Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
>>
>>   No this issue is not happening for a large file with a single active user.
>>>
>>> Thanks,
>>> Swapnil
>>>
>>> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>>   Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>>>   While loading larger PDF files more than 200 MB for 10 concurrent users
>>>>> with 3GB heap Space configured in JVM, we are getting following
>>>>> exception
>>>>> :
>>>>>
>>>>>   Does this also happen with a large file when only 1 user is active?
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox loading larger PDF

Posted by Swapnil Raverkar <sw...@gmail.com>.
Hi Tilman,

For PDFBox is there any way to load the file in chunks instead of loading
entire file in memory for populating the document object? It is consuming
around 1GB heap memory at start-up for a single file (200 MB) for a single
user and drops down to 600-700 MB during the processing time.

Thanks,
Swapnil

On 27 March 2015 at 23:44, Tilman Hausherr <TH...@t-online.de> wrote:

> Errors that happen in concurrent situations are notoriously hard to find.
> The best would be to prepare
> - a large file of the kind you mention
> - code of a servlet to be run in tomcat (smallest possible code that
> brings the error)
> then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune
> the parameters until it happens for sure, then send us all that.
>
> If the file is confidential, try if the error happens by merging a non
> confidential file with itself.
>
> Tilman
>
> Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
>
>  No this issue is not happening for a large file with a single active user.
>>
>>
>> Thanks,
>> Swapnil
>>
>> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>>  Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>>
>>>  While loading larger PDF files more than 200 MB for 10 concurrent users
>>>> with 3GB heap Space configured in JVM, we are getting following
>>>> exception
>>>> :
>>>>
>>>>  Does this also happen with a large file when only 1 user is active?
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDFBox loading larger PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Errors that happen in concurrent situations are notoriously hard to 
find. The best would be to prepare
- a large file of the kind you mention
- code of a servlet to be run in tomcat (smallest possible code that 
brings the error)
then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune 
the parameters until it happens for sure, then send us all that.

If the file is confidential, try if the error happens by merging a non 
confidential file with itself.

Tilman

Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
> No this issue is not happening for a large file with a single active user.
>
>
> Thanks,
> Swapnil
>
> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>
>>> While loading larger PDF files more than 200 MB for 10 concurrent users
>>> with 3GB heap Space configured in JVM, we are getting following exception
>>> :
>>>
>> Does this also happen with a large file when only 1 user is active?
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox loading larger PDF

Posted by Swapnil Raverkar <sw...@gmail.com>.
No this issue is not happening for a large file with a single active user.


Thanks,
Swapnil

On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:

> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>
>> While loading larger PDF files more than 200 MB for 10 concurrent users
>> with 3GB heap Space configured in JVM, we are getting following exception
>> :
>>
>
> Does this also happen with a large file when only 1 user is active?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDFBox loading larger PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
> While loading larger PDF files more than 200 MB for 10 concurrent users
> with 3GB heap Space configured in JVM, we are getting following exception :

Does this also happen with a large file when only 1 user is active?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org