You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Tim Costermans <ti...@unifiedpost.com> on 2014/03/26 14:30:37 UTC
PDFBox 1.8.4 and pdf's generated by MS Word

Hello,

It' seems that pdf's generated by MS Word 2010 or 2013 are a recipe for trouble in combination with PDFBOX version 1.8.0 or 1.8.4.
I upgrade to PDFBOX 1.8.4 and one issue remains:

Caused by: **thirdparty.pdf.exceptions.PDFParsingException: [offset=91308]Expected numeric object for object number
                        at **thirdparty.pdf.exceptions.PDFParsingException.newInstance(PDFParsingException.java:58)
                        at **thirdparty.pdf.io.PDFParser.throwEx(PDFParser.java:1215)
                        at **thirdparty.pdf.io.PDFParser.readCompressedCrossRefTable(PDFParser.java:805)
                        at **thirdparty.pdf.io.PDFParser.readCrossRefTable(PDFParser.java:1175)
                        at **thirdparty.pdf.PDFDocument.open(PDFDocument.java:154)
                        at **thirdparty.PDFDocument.open(PDFDocument.java:124)
                        at com.*****.sign.pdf.PDFPresigner.presign(PDFPresigner.java:24)
                        ... 26 more

How to reproduce:
1) Fire up MS Word v 2010 , type some text, save as PDF.
2) Open this pdf file with Notepad++, you will notice the following at the bottom of the file:
...
trailer
<</Size 18/Root 1 0 R/Info 7 0 R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE5F>] >>
startxref
82089
%%EOF
xref
0 0
trailer
<</Size 18/Root 1 0 R/Info 7 0 R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE5F>] /Prev 82089/XRefStm 81819>>
startxref
82605
%%EOF

Our application is trying to add an image to this pdf using PDFBox, when calling PDFDocument.save() the "revisions" are merged an a new pdf is being created.
The newly created pdf is being passed to a third party that tries to open it, but it fails because XRefStm is not correctly updated during save.
Probably related to https://issues.apache.org/jira/browse/PDFBOX-1822

I also tried PDFDocument.incrementalSave() but then I get into a nullpointer exception cuased by  PDFXRefStream:  List<Integer> indexEntry = getIndexEntry(); containing two null objects. (first and last still being null and being added to the list).
How do I solve this issue?
What's the real issue here?
I'm not in control of the pdf's that the application can receive.

Also ran into the following bug but worked around it https://issues.apache.org/jira/browse/PDFBOX-1838 .