You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Swapnil Raverkar <sw...@gmail.com> on 2015/03/27 08:49:58 UTC
PDFBox loading larger PDF
Hi,
While loading larger PDF files more than 200 MB for 10 concurrent users
with 3GB heap Space configured in JVM, we are getting following exception :
com.apple.ist.acss.pdf.encoding.exception.EncoderException:
org.apache.pdfbox.exceptions.WrappedIOException
at
com.apple.ist.acss.core.encoder.impl.WatermarksPDFEncoderImpl.getWatermarksEncodedPDF(SourceFile:76)
at
com.apple.plm.pdfwatermarking.util.PDFBinaryCodeEncoder.getEncodedContent(PDFBinaryCodeEncoder.java:89)
at
com.apple.plm.pdfwatermarking.util.PDFWatermarkingCommon.callWatermarkForFM(PDFWatermarkingCommon.java:471)
at
com.apple.plm.pdfwatermarking.PDFWatermarking.doFilter(PDFWatermarking.java:128)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278)
at
com.apple.ist.acss.core.encoder.impl.WatermarksPDFEncoderImpl.getWatermarksEncodedPDF(SourceFile:51)
... 19 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:604)
at java.util.ArrayList.get(ArrayList.java:382)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
at
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
at
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:616)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:650)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
... 20 more
Is there any way by which we can address this issue? i.e. instead of
loading entire PDF in memory, load it in chunks?
Any pointers on this would be really helpful.
Thanks,
Swapnil
Re: PDFBox loading larger PDF
Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,
Sorry, no, we need the whole file.
You could try to use a scratch file in load() or (better) loadNonSeq.
I also see my last mail was a bit unclear
- with "merging a file with itself" I meant doing this until you get a
huge non confidential file. Try e.g. with the PDF specification file.
- with "ab.exe" I meant "apache benchmark". It isn't part of the tomcat
distribution, but one can extract it from some apache webserver
distributions.
I assume you use the 1.8 version. In the unreleased 2.0 version there
would be a trick (with a few lines of code change) to save memory by not
keeping the images unless needed.
Tilman
Am 28.03.2015 um 08:14 schrieb Swapnil Raverkar:
> Hi Tilman,
>
> For PDFBox is there any way to load the file in chunks instead of loading
> entire file in memory for populating the document object? It is consuming
> around 1GB heap memory at start-up for a single file (200 MB) for a single
> user and drops down to 600-700 MB during the processing time.
>
> Thanks,
> Swapnil
>
> On 27 March 2015 at 23:44, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Errors that happen in concurrent situations are notoriously hard to find.
>> The best would be to prepare
>> - a large file of the kind you mention
>> - code of a servlet to be run in tomcat (smallest possible code that
>> brings the error)
>> then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune
>> the parameters until it happens for sure, then send us all that.
>>
>> If the file is confidential, try if the error happens by merging a non
>> confidential file with itself.
>>
>> Tilman
>>
>> Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
>>
>> No this issue is not happening for a large file with a single active user.
>>>
>>> Thanks,
>>> Swapnil
>>>
>>> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>>> While loading larger PDF files more than 200 MB for 10 concurrent users
>>>>> with 3GB heap Space configured in JVM, we are getting following
>>>>> exception
>>>>> :
>>>>>
>>>>> Does this also happen with a large file when only 1 user is active?
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDFBox loading larger PDF
Posted by Swapnil Raverkar <sw...@gmail.com>.
Hi Tilman,
For PDFBox is there any way to load the file in chunks instead of loading
entire file in memory for populating the document object? It is consuming
around 1GB heap memory at start-up for a single file (200 MB) for a single
user and drops down to 600-700 MB during the processing time.
Thanks,
Swapnil
On 27 March 2015 at 23:44, Tilman Hausherr <TH...@t-online.de> wrote:
> Errors that happen in concurrent situations are notoriously hard to find.
> The best would be to prepare
> - a large file of the kind you mention
> - code of a servlet to be run in tomcat (smallest possible code that
> brings the error)
> then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune
> the parameters until it happens for sure, then send us all that.
>
> If the file is confidential, try if the error happens by merging a non
> confidential file with itself.
>
> Tilman
>
> Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
>
> No this issue is not happening for a large file with a single active user.
>>
>>
>> Thanks,
>> Swapnil
>>
>> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>>
>>> While loading larger PDF files more than 200 MB for 10 concurrent users
>>>> with 3GB heap Space configured in JVM, we are getting following
>>>> exception
>>>> :
>>>>
>>>> Does this also happen with a large file when only 1 user is active?
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: PDFBox loading larger PDF
Posted by Tilman Hausherr <TH...@t-online.de>.
Errors that happen in concurrent situations are notoriously hard to
find. The best would be to prepare
- a large file of the kind you mention
- code of a servlet to be run in tomcat (smallest possible code that
brings the error)
then use ab.exe to stress-run the servlet in the tomcat 8 and fine tune
the parameters until it happens for sure, then send us all that.
If the file is confidential, try if the error happens by merging a non
confidential file with itself.
Tilman
Am 27.03.2015 um 18:42 schrieb Swapnil Raverkar:
> No this issue is not happening for a large file with a single active user.
>
>
> Thanks,
> Swapnil
>
> On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>>
>>> While loading larger PDF files more than 200 MB for 10 concurrent users
>>> with 3GB heap Space configured in JVM, we are getting following exception
>>> :
>>>
>> Does this also happen with a large file when only 1 user is active?
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDFBox loading larger PDF
Posted by Swapnil Raverkar <sw...@gmail.com>.
No this issue is not happening for a large file with a single active user.
Thanks,
Swapnil
On 27 March 2015 at 01:34, Tilman Hausherr <TH...@t-online.de> wrote:
> Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
>
>> While loading larger PDF files more than 200 MB for 10 concurrent users
>> with 3GB heap Space configured in JVM, we are getting following exception
>> :
>>
>
> Does this also happen with a large file when only 1 user is active?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: PDFBox loading larger PDF
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 27.03.2015 um 08:49 schrieb Swapnil Raverkar:
> While loading larger PDF files more than 200 MB for 10 concurrent users
> with 3GB heap Space configured in JVM, we are getting following exception :
Does this also happen with a large file when only 1 user is active?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org