You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Doswald (JIRA)" <ji...@apache.org> on 2016/07/28 10:57:20 UTC

[jira] [Commented] (PDFBOX-3441) NumberFormatException when loading large PDF file

    [ https://issues.apache.org/jira/browse/PDFBOX-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397389#comment-15397389 ] 

Michael Doswald commented on PDFBOX-3441:
-----------------------------------------

The PDF Reference, sixth edition, chapter 3.4.3 'Cross-Reference Table' specifies a cross-reference entry as

{{nnnnnnnnnn ggggg n eol}}

where

{{nnnnnnnnnn}} is a 10-digit byte offset

The offset is currently parsed as an integer, which has a maximum value of 2147483647 (2 GB), but the reference allows a maximum value of 9999999999.

Looking at the source code, the XrefTrailerResolver#setXRef method is already prepared to take a 'long' value. So I guess it should only be a matter of parsing the currOffset as long instead of int. Maybe there will be more problems in other areas of pdfbox when using such big offsets though.

> NumberFormatException when loading large PDF file
> -------------------------------------------------
>
>                 Key: PDFBOX-3441
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3441
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.2
>         Environment: Win 10 Pro, 16GB RAM
>            Reporter: Pavel Fol
>         Attachments: exception.PNG
>
>
> If you trying to load very large PDF file (over 2GB), you get java.io.IOException: java.lang.NumberFormatException: For input string: "2313730984". 
> It fails in COSParser.java in parseXrefTable(long startByteOffset). On the line 2006, if Integer.parseInt(splitString[1]) reads number which is bigger than maximum int.
> //////
> java.io.IOException: java.lang.NumberFormatException: For input string: "2313730984"
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:2012)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:223)
> 	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:840)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:765)
> 	at Test.main(Test.java:17)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.lang.NumberFormatException: For input string: "2313730984"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> 	at java.lang.Integer.parseInt(Integer.java:583)
> 	at java.lang.Integer.parseInt(Integer.java:615)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:2005)
> 	... 11 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org