You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/05/23 18:42:20 UTC

[jira] [Commented] (PDFBOX-1607) StringIndexOutOfBoundsException in PDFParser

    [ https://issues.apache.org/jira/browse/PDFBOX-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665337#comment-13665337 ] 

Andreas Lehmkühler commented on PDFBOX-1607:
--------------------------------------------

There is some HTML garbage at the end of the pdf and PDFBox tries to read <b> as COSString in hex format, which leads to the exception.

The non sequential parser works like a charm. Use PDDocument#loadNonSeq instead of PDDocuemnt#load.
                
> StringIndexOutOfBoundsException in PDFParser
> --------------------------------------------
>
>                 Key: PDFBOX-1607
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1607
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.1
>         Environment: Windows 7, JRE 1.7.0_15-b03
>            Reporter: Alex Alishevskikh
>         Attachments: pdf-govdocs-036902.pdf, pdf-govdocs-107566.pdf
>
>
> I have few test files parsed fine in PDFBox 1.7.1 but not in 1.8.1:
> java.lang.StringIndexOutOfBoundsException: String index out of range: 2047
>      at java.lang.AbstractStringBuilder.deleteCharAt(AbstractStringBuilder.java:762)
>      at java.lang.StringBuilder.deleteCharAt(StringBuilder.java:258)
>      at org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1000)
>      at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:808)
>      at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1241)
>      at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:558)
>      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira