You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Doswald (JIRA)" <ji...@apache.org> on 2016/07/28 10:57:20 UTC
[jira] [Commented] (PDFBOX-3441) NumberFormatException when loading
large PDF file
[ https://issues.apache.org/jira/browse/PDFBOX-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397389#comment-15397389 ]
Michael Doswald commented on PDFBOX-3441:
-----------------------------------------
The PDF Reference, sixth edition, chapter 3.4.3 'Cross-Reference Table' specifies a cross-reference entry as
{{nnnnnnnnnn ggggg n eol}}
where
{{nnnnnnnnnn}} is a 10-digit byte offset
The offset is currently parsed as an integer, which has a maximum value of 2147483647 (2 GB), but the reference allows a maximum value of 9999999999.
Looking at the source code, the XrefTrailerResolver#setXRef method is already prepared to take a 'long' value. So I guess it should only be a matter of parsing the currOffset as long instead of int. Maybe there will be more problems in other areas of pdfbox when using such big offsets though.
> NumberFormatException when loading large PDF file
> -------------------------------------------------
>
> Key: PDFBOX-3441
> URL: https://issues.apache.org/jira/browse/PDFBOX-3441
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.2
> Environment: Win 10 Pro, 16GB RAM
> Reporter: Pavel Fol
> Attachments: exception.PNG
>
>
> If you trying to load very large PDF file (over 2GB), you get java.io.IOException: java.lang.NumberFormatException: For input string: "2313730984".
> It fails in COSParser.java in parseXrefTable(long startByteOffset). On the line 2006, if Integer.parseInt(splitString[1]) reads number which is bigger than maximum int.
> //////
> java.io.IOException: java.lang.NumberFormatException: For input string: "2313730984"
> at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:2012)
> at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:223)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:840)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:765)
> at Test.main(Test.java:17)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.lang.NumberFormatException: For input string: "2313730984"
> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:583)
> at java.lang.Integer.parseInt(Integer.java:615)
> at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:2005)
> ... 11 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org