You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jonathan Carr <jo...@york.ac.uk> on 2014/08/01 19:20:53 UTC

Problems running PDFBox within a servlet container

Hello all

I've written code to take an existing PDF containing a form, populate
values in some of the fields, and then output the results as a new PDF. As
a standalone test class this works great, however as soon as put the code
into a servlet (I wish to serve the resulting PDF to users as a download) I
get an error. The PDFBox-related code and PDF document being used are
exactly the same in both cases.

Clearly there's a difference between the servlet and non-servlet
environments, but so far I've failed to find it. Does anyone have any
suggestions as to where I should be looking?

I'm running OpenJDK 7 and pdfbox-1.8.6, with Jetty as my servlet container.

Example code:

InputStream in = MyServlet.class.getResourceAsStream("Form.pdf");
PDDocument doc = PDDocument.load(in);
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDAcroForm acroForm = cat.getAcroForm();

PDField field = acroForm.getField("myField");
field.setValue("someValue");


The following error is thrown when setValue() is called:

java.io.IOException: null
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
~[pdfbox-app-1.8.6.jar:na]
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:342)
~[pdfbox-app-1.8.6.jar:na]
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:254)
~[pdfbox-app-1.8.6.jar:na]
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:188)
~[pdfbox-app-1.8.6.jar:na]
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:122)
~[pdfbox-app-1.8.6.jar:na]
at
org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.getStreamTokens(PDAppearance.java:186)
~[pdfbox-app-1.8.6.jar:na]
at
org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.getStreamTokens(PDAppearance.java:159)
~[pdfbox-app-1.8.6.jar:na]
at
org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:266)
~[pdfbox-app-1.8.6.jar:na]
at
org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
~[pdfbox-app-1.8.6.jar:na]
...

Caused by:

java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method) ~[na:1.7.0_55]
at java.util.zip.Inflater.inflate(Inflater.java:259) ~[na:1.7.0_55]
at java.util.zip.Inflater.inflate(Inflater.java:280) ~[na:1.7.0_55]
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:129)
~[pdfbox-app-1.8.6.jar:na]
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
~[pdfbox-app-1.8.6.jar:na]
...


Many thanks
-Jonathan

Re: Problems running PDFBox within a servlet container

Posted by Jonathan Carr <jo...@york.ac.uk>.
Many thanks Tilman, Brzrk. I wasn't aware of potential multi-threading
problems so will certainly check my code for that.

It turns out that the PDF was being corrupted at build time, so I was
looking in the wrong place. I've now added a build time test to ensure
PDFBox can parse the template documents!

We use Maven filters (intended for simple text file resources) which
corrupted the PDFs thus breaking the PDFBox parsing at runtime. For anyone
else with this issue, this is the SO post that alerted us to the problem:

https://stackoverflow.com/questions/10797831/jar-file-gets-corrupted-while-building-with-maven

Cheers
-Jonathan


On 1 August 2014 20:47, Tilman Hausherr <TH...@t-online.de> wrote:

> Yes, that is what I mean, my text was unclear. One PDDocument may be
> accessed only by one thread.
>
> Tilman
>
> Am 01.08.2014 21:16, schrieb Brzrk One:
>
>> In my experience, parsing the PDF fails badly when the
>> PDDocument is accessed by more than one thread.
>> Usually in the decompression and decryption, but not exclusively.
>> Certainly the thread model is different btwn a standalone app and a
>> servlet.
>>
>>
>> On Fri, Aug 1, 2014 at 3:07 PM, Tilman Hausherr <TH...@t-online.de>
>> wrote:
>>
>>  I once got similar exceptions when trying to access the same PDF from
>>> several threads. See also
>>>
>>> https://pdfbox.apache.org/userguide/faq.html#threadsafe
>>>
>>> Tilman
>>>
>>> Am 01.08.2014 19:20, schrieb Jonathan Carr:
>>>
>>>  Hello all
>>>>
>>>> I've written code to take an existing PDF containing a form, populate
>>>> values in some of the fields, and then output the results as a new PDF.
>>>> As
>>>> a standalone test class this works great, however as soon as put the
>>>> code
>>>> into a servlet (I wish to serve the resulting PDF to users as a
>>>> download)
>>>> I
>>>> get an error. The PDFBox-related code and PDF document being used are
>>>> exactly the same in both cases.
>>>>
>>>> Clearly there's a difference between the servlet and non-servlet
>>>> environments, but so far I've failed to find it. Does anyone have any
>>>> suggestions as to where I should be looking?
>>>>
>>>> I'm running OpenJDK 7 and pdfbox-1.8.6, with Jetty as my servlet
>>>> container.
>>>>
>>>> Example code:
>>>>
>>>> InputStream in = MyServlet.class.getResourceAsStream("Form.pdf");
>>>> PDDocument doc = PDDocument.load(in);
>>>> PDDocumentCatalog cat = doc.getDocumentCatalog();
>>>> PDAcroForm acroForm = cat.getAcroForm();
>>>>
>>>> PDField field = acroForm.getField("myField");
>>>> field.setValue("someValue");
>>>>
>>>>
>>>> The following error is thrown when setValue() is called:
>>>>
>>>> java.io.IOException: null
>>>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:342)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:254)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
>>>> COSStream.java:188)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at
>>>> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(
>>>> PDFStreamParser.java:122)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>>> getStreamTokens(PDAppearance.java:186)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>>> getStreamTokens(PDAppearance.java:159)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>>> setAppearanceValue(PDAppearance.java:266)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(
>>>> PDVariableText.java:131)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> ...
>>>>
>>>> Caused by:
>>>>
>>>> java.util.zip.DataFormatException: incorrect header check
>>>> at java.util.zip.Inflater.inflateBytes(Native Method) ~[na:1.7.0_55]
>>>> at java.util.zip.Inflater.inflate(Inflater.java:259) ~[na:1.7.0_55]
>>>> at java.util.zip.Inflater.inflate(Inflater.java:280) ~[na:1.7.0_55]
>>>> at org.apache.pdfbox.filter.FlateFilter.decompress(
>>>> FlateFilter.java:129)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
>>>> ~[pdfbox-app-1.8.6.jar:na]
>>>> ...
>>>>
>>>>
>>>> Many thanks
>>>> -Jonathan
>>>>
>>>>
>>>>
>


-- 
------------------------------------------
Jonathan Carr
Skills Forge Project

T: +44 1904 322 833
E: jonathan.carr@york.ac.uk

http://www.york.ac.uk/about/legal-statements/email-disclaimer/

Re: Problems running PDFBox within a servlet container

Posted by Tilman Hausherr <TH...@t-online.de>.
Yes, that is what I mean, my text was unclear. One PDDocument may be 
accessed only by one thread.

Tilman

Am 01.08.2014 21:16, schrieb Brzrk One:
> In my experience, parsing the PDF fails badly when the
> PDDocument is accessed by more than one thread.
> Usually in the decompression and decryption, but not exclusively.
> Certainly the thread model is different btwn a standalone app and a servlet.
>
>
> On Fri, Aug 1, 2014 at 3:07 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> I once got similar exceptions when trying to access the same PDF from
>> several threads. See also
>>
>> https://pdfbox.apache.org/userguide/faq.html#threadsafe
>>
>> Tilman
>>
>> Am 01.08.2014 19:20, schrieb Jonathan Carr:
>>
>>> Hello all
>>>
>>> I've written code to take an existing PDF containing a form, populate
>>> values in some of the fields, and then output the results as a new PDF. As
>>> a standalone test class this works great, however as soon as put the code
>>> into a servlet (I wish to serve the resulting PDF to users as a download)
>>> I
>>> get an error. The PDFBox-related code and PDF document being used are
>>> exactly the same in both cases.
>>>
>>> Clearly there's a difference between the servlet and non-servlet
>>> environments, but so far I've failed to find it. Does anyone have any
>>> suggestions as to where I should be looking?
>>>
>>> I'm running OpenJDK 7 and pdfbox-1.8.6, with Jetty as my servlet
>>> container.
>>>
>>> Example code:
>>>
>>> InputStream in = MyServlet.class.getResourceAsStream("Form.pdf");
>>> PDDocument doc = PDDocument.load(in);
>>> PDDocumentCatalog cat = doc.getDocumentCatalog();
>>> PDAcroForm acroForm = cat.getAcroForm();
>>>
>>> PDField field = acroForm.getField("myField");
>>> field.setValue("someValue");
>>>
>>>
>>> The following error is thrown when setValue() is called:
>>>
>>> java.io.IOException: null
>>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:342)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:254)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
>>> COSStream.java:188)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at
>>> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(
>>> PDFStreamParser.java:122)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at
>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>> getStreamTokens(PDAppearance.java:186)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at
>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>> getStreamTokens(PDAppearance.java:159)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at
>>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>>> setAppearanceValue(PDAppearance.java:266)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at
>>> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(
>>> PDVariableText.java:131)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> ...
>>>
>>> Caused by:
>>>
>>> java.util.zip.DataFormatException: incorrect header check
>>> at java.util.zip.Inflater.inflateBytes(Native Method) ~[na:1.7.0_55]
>>> at java.util.zip.Inflater.inflate(Inflater.java:259) ~[na:1.7.0_55]
>>> at java.util.zip.Inflater.inflate(Inflater.java:280) ~[na:1.7.0_55]
>>> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:129)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
>>> ~[pdfbox-app-1.8.6.jar:na]
>>> ...
>>>
>>>
>>> Many thanks
>>> -Jonathan
>>>
>>>


Re: Problems running PDFBox within a servlet container

Posted by Brzrk One <br...@gmail.com>.
In my experience, parsing the PDF fails badly when the
PDDocument is accessed by more than one thread.
Usually in the decompression and decryption, but not exclusively.
Certainly the thread model is different btwn a standalone app and a servlet.


On Fri, Aug 1, 2014 at 3:07 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> I once got similar exceptions when trying to access the same PDF from
> several threads. See also
>
> https://pdfbox.apache.org/userguide/faq.html#threadsafe
>
> Tilman
>
> Am 01.08.2014 19:20, schrieb Jonathan Carr:
>
>> Hello all
>>
>> I've written code to take an existing PDF containing a form, populate
>> values in some of the fields, and then output the results as a new PDF. As
>> a standalone test class this works great, however as soon as put the code
>> into a servlet (I wish to serve the resulting PDF to users as a download)
>> I
>> get an error. The PDFBox-related code and PDF document being used are
>> exactly the same in both cases.
>>
>> Clearly there's a difference between the servlet and non-servlet
>> environments, but so far I've failed to find it. Does anyone have any
>> suggestions as to where I should be looking?
>>
>> I'm running OpenJDK 7 and pdfbox-1.8.6, with Jetty as my servlet
>> container.
>>
>> Example code:
>>
>> InputStream in = MyServlet.class.getResourceAsStream("Form.pdf");
>> PDDocument doc = PDDocument.load(in);
>> PDDocumentCatalog cat = doc.getDocumentCatalog();
>> PDAcroForm acroForm = cat.getAcroForm();
>>
>> PDField field = acroForm.getField("myField");
>> field.setValue("someValue");
>>
>>
>> The following error is thrown when setValue() is called:
>>
>> java.io.IOException: null
>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:342)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:254)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
>> COSStream.java:188)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at
>> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(
>> PDFStreamParser.java:122)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at
>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>> getStreamTokens(PDAppearance.java:186)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at
>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>> getStreamTokens(PDAppearance.java:159)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at
>> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.
>> setAppearanceValue(PDAppearance.java:266)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at
>> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(
>> PDVariableText.java:131)
>> ~[pdfbox-app-1.8.6.jar:na]
>> ...
>>
>> Caused by:
>>
>> java.util.zip.DataFormatException: incorrect header check
>> at java.util.zip.Inflater.inflateBytes(Native Method) ~[na:1.7.0_55]
>> at java.util.zip.Inflater.inflate(Inflater.java:259) ~[na:1.7.0_55]
>> at java.util.zip.Inflater.inflate(Inflater.java:280) ~[na:1.7.0_55]
>> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:129)
>> ~[pdfbox-app-1.8.6.jar:na]
>> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
>> ~[pdfbox-app-1.8.6.jar:na]
>> ...
>>
>>
>> Many thanks
>> -Jonathan
>>
>>
>

Re: Problems running PDFBox within a servlet container

Posted by Tilman Hausherr <TH...@t-online.de>.
I once got similar exceptions when trying to access the same PDF from 
several threads. See also

https://pdfbox.apache.org/userguide/faq.html#threadsafe

Tilman

Am 01.08.2014 19:20, schrieb Jonathan Carr:
> Hello all
>
> I've written code to take an existing PDF containing a form, populate
> values in some of the fields, and then output the results as a new PDF. As
> a standalone test class this works great, however as soon as put the code
> into a servlet (I wish to serve the resulting PDF to users as a download) I
> get an error. The PDFBox-related code and PDF document being used are
> exactly the same in both cases.
>
> Clearly there's a difference between the servlet and non-servlet
> environments, but so far I've failed to find it. Does anyone have any
> suggestions as to where I should be looking?
>
> I'm running OpenJDK 7 and pdfbox-1.8.6, with Jetty as my servlet container.
>
> Example code:
>
> InputStream in = MyServlet.class.getResourceAsStream("Form.pdf");
> PDDocument doc = PDDocument.load(in);
> PDDocumentCatalog cat = doc.getDocumentCatalog();
> PDAcroForm acroForm = cat.getAcroForm();
>
> PDField field = acroForm.getField("myField");
> field.setValue("someValue");
>
>
> The following error is thrown when setValue() is called:
>
> java.io.IOException: null
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
> ~[pdfbox-app-1.8.6.jar:na]
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:342)
> ~[pdfbox-app-1.8.6.jar:na]
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:254)
> ~[pdfbox-app-1.8.6.jar:na]
> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:188)
> ~[pdfbox-app-1.8.6.jar:na]
> at
> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:122)
> ~[pdfbox-app-1.8.6.jar:na]
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.getStreamTokens(PDAppearance.java:186)
> ~[pdfbox-app-1.8.6.jar:na]
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.getStreamTokens(PDAppearance.java:159)
> ~[pdfbox-app-1.8.6.jar:na]
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:266)
> ~[pdfbox-app-1.8.6.jar:na]
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
> ~[pdfbox-app-1.8.6.jar:na]
> ...
>
> Caused by:
>
> java.util.zip.DataFormatException: incorrect header check
> at java.util.zip.Inflater.inflateBytes(Native Method) ~[na:1.7.0_55]
> at java.util.zip.Inflater.inflate(Inflater.java:259) ~[na:1.7.0_55]
> at java.util.zip.Inflater.inflate(Inflater.java:280) ~[na:1.7.0_55]
> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:129)
> ~[pdfbox-app-1.8.6.jar:na]
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
> ~[pdfbox-app-1.8.6.jar:na]
> ...
>
>
> Many thanks
> -Jonathan
>