You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Kodjo Afriyie - iSite Eng <ko...@bbc.co.uk> on 2019/07/22 14:02:05 UTC

Problems parsing PDF document.

Hi,

Getting the following when trying to parse a pdf:

Problems IOEXCEPTION:d37.pdf:java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at offset 9997
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:965)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:281)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:214)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:866)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:281)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:214)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:866)
at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1070)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1026)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:974)
at uk.bbci.ugcuploader.SanitizePDF.lambda$main$0(SanitizePDF.java:27)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at uk.bbci.ugcuploader.SanitizePDF.main(SanitizePDF.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)

Jul 22, 2019 1:50:05 PM org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm flatten
WARNING: Flatten for a dynamix XFA form is not supported

Two files have been place in this location in tar file.

https://1drv.ms/u/s!AmNEMt7g6KbuhhvqZrS5YsKxXsdy?e=ebCdr6https://1drv.ms/u/s!AmNEMt7g6KbuhhvqZrS5YsKxXsdy?e=ebCdr6

With regards to last message.. is there another way removing these elements from the form .i.e dynamix XFA forms..

Thanks
Kodjo

Re: Problems parsing PDF document.

Posted by Kodjo Afriyie - iSite Eng <ko...@bbc.co.uk>.
There is also a file in the tar ball.. Named d37.pdf that I can not remove
the virus from:

Link below is the screen shot.. from the virus detector..

https://1drv.ms/u/s!AmNEMt7g6KbuhiGJLIrcbSnOpCW1?e=cVUeO4

On 22/07/2019, 15:02, "Kodjo Afriyie - iSite Eng"
<ko...@bbc.co.uk> wrote:

>Hi,
>
>Getting the following when trying to parse a pdf:
>
>Problems IOEXCEPTION:d37.pdf:java.io.IOException: Unknown dir object
>c='>' cInt=62 peek='>' peekInt=62 at offset 9997
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:965)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.
>java:152)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(Bas
>eParser.java:281)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:
>214)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:866)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.
>java:152)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(Bas
>eParser.java:281)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:
>214)
>at 
>org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:866)
>at 
>org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912)
>at 
>org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.jav
>a:881)
>at 
>org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.jav
>a:801)
>at 
>org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761)
>at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1070)
>at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1026)
>at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:974)
>at uk.bbci.ugcuploader.SanitizePDF.lambda$main$0(SanitizePDF.java:27)
>at 
>java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1
>382)
>at 
>java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580
>)
>at uk.bbci.ugcuploader.SanitizePDF.main(SanitizePDF.java:26)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>62)
>at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at 
>org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java
>:48)
>at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
>at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
>at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
>
>Jul 22, 2019 1:50:05 PM
>org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm flatten
>WARNING: Flatten for a dynamix XFA form is not supported
>
>Two files have been place in this location in tar file.
>
>https://1drv.ms/u/s!AmNEMt7g6KbuhhvqZrS5YsKxXsdy?e=ebCdr6https://1drv.ms/u
>/s!AmNEMt7g6KbuhhvqZrS5YsKxXsdy?e=ebCdr6
>
>With regards to last message.. is there another way removing these
>elements from the form .i.e dynamix XFA forms..
>
>Thanks
>Kodjo


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org