You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/03/29 18:36:00 UTC

[jira] [Closed] (PDFBOX-4501) References numbers in embedded PDF become floats

     [ https://issues.apache.org/jira/browse/PDFBOX-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-4501.
-----------------------------------
    Resolution: Duplicate

Duplicate of PDFBOX-4495, fixed a few days ago. Your file displays.

> References numbers in embedded PDF become floats
> ------------------------------------------------
>
>                 Key: PDFBOX-4501
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4501
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Daniel Persson
>            Priority: Major
>         Attachments: float_pointer.patch
>
>
> Hi everyone.
> We found an issue that happens sometimes with smaller producers that create PDF files with embedded advertisements or other articles. 
> For some reason, this embedded makes the library to throw an exception and not read the file. In many cases, we can read most of the pages but just these embedded data will be missing.
> I wrote a little patch that will handle the issue but I don't know how to decode the embedded data so I have not debugged the issue further. I will add a link to the file because it's 124 Mb so not allowed to upload with the issue.
> [https://drive.google.com/file/d/1hQslqtrbIoo5bTmMXgH1NDSYXuvIUOAQ/view?usp=sharing]
> If we could find a solution where the PDF could be read correctly that would be great but the current behavior of not reading it at all is not great.
>  
> ```
> java.io.IOException: expected number, actual=COSFloat\{18446744073221199360} at offset 127766191
>  org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166)
>  org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
>  org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
>  org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
>  org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912)
>  org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881)
>  org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801)
>  org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761)
>  org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
>  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
>  org.apache.pdfbox.debugger.PDFDebugger$12.open(PDFDebugger.java:1272)
>  org.apache.pdfbox.debugger.PDFDebugger$DocumentOpener.parse(PDFDebugger.java:1383)
>  org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1275)
>  org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1252)
>  org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1243)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org