You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/05/23 18:42:20 UTC
[jira] [Commented] (PDFBOX-1607) StringIndexOutOfBoundsException in
PDFParser
[ https://issues.apache.org/jira/browse/PDFBOX-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665337#comment-13665337 ]
Andreas Lehmkühler commented on PDFBOX-1607:
--------------------------------------------
There is some HTML garbage at the end of the pdf and PDFBox tries to read <b> as COSString in hex format, which leads to the exception.
The non sequential parser works like a charm. Use PDDocument#loadNonSeq instead of PDDocuemnt#load.
> StringIndexOutOfBoundsException in PDFParser
> --------------------------------------------
>
> Key: PDFBOX-1607
> URL: https://issues.apache.org/jira/browse/PDFBOX-1607
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.1
> Environment: Windows 7, JRE 1.7.0_15-b03
> Reporter: Alex Alishevskikh
> Attachments: pdf-govdocs-036902.pdf, pdf-govdocs-107566.pdf
>
>
> I have few test files parsed fine in PDFBox 1.7.1 but not in 1.8.1:
> java.lang.StringIndexOutOfBoundsException: String index out of range: 2047
> at java.lang.AbstractStringBuilder.deleteCharAt(AbstractStringBuilder.java:762)
> at java.lang.StringBuilder.deleteCharAt(StringBuilder.java:258)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1000)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:808)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1241)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:558)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira