You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Sean Bridges (JIRA)" <ji...@apache.org> on 2009/05/13 00:01:46 UTC

[jira] Created: (PDFBOX-468) index out of bounds exception

index out of bounds exception
-----------------------------

                 Key: PDFBOX-468
                 URL: https://issues.apache.org/jira/browse/PDFBOX-468
             Project: PDFBox
          Issue Type: Bug
            Reporter: Sean Bridges
             Fix For: 0.8.0-incubator


This is with  svn revision 773978

I get an index out of bounds exception parsing pdf files, I can't give you the file but the exception is,

Caused by: org.apache.pdfbox.exceptions.WrappedIOException
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:228)
	at message_analyzer.extractor.PDFExtractor.getContent(PDFExtractor.java:32)
	... 19 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.pdfbox.pdfparser.BaseParser.cmpCircularBuffer(BaseParser.java:398)
	at org.apache.pdfbox.pdfparser.BaseParser.readUntilEndStream(BaseParser.java:355)
	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:322)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
	... 20 more



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-468) index out of bounds exception

Posted by "Sean Bridges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Bridges updated PDFBOX-468:
--------------------------------

    Attachment: patch

This patch fixes the issue.  The problem is the first read,

int nextIdx = pdfSource.read(buffer) % buffer.length; 

may return no contents, and nextIdx is -1.  This causes an index out of bounds exception on the first call to cmpCircularBuffer

To make the parser more reliable in the face of invalid input, it might be good to always do,

pdfSource.unread( ENDSTREAM ); or  pdfSource.unread( ENDOBJ );

in this method.  



> index out of bounds exception
> -----------------------------
>
>                 Key: PDFBOX-468
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-468
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Sean Bridges
>             Fix For: 0.8.0-incubator
>
>         Attachments: patch
>
>
> This is with  svn revision 773978
> I get an index out of bounds exception parsing pdf files, I can't give you the file but the exception is,
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:228)
> 	at message_analyzer.extractor.PDFExtractor.getContent(PDFExtractor.java:32)
> 	... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> 	at org.apache.pdfbox.pdfparser.BaseParser.cmpCircularBuffer(BaseParser.java:398)
> 	at org.apache.pdfbox.pdfparser.BaseParser.readUntilEndStream(BaseParser.java:355)
> 	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:322)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
> 	... 20 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-468) index out of bounds exception

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Carrier resolved PDFBOX-468.
----------------------------------

    Resolution: Fixed

Checked into trunk.  The extra unread() is not correct though because -1 is returned when no bytes were read, therefore nothing needs to be unread().

Sending        trunk/src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java
Transmitting file data .
Committed revision 778883.

> index out of bounds exception
> -----------------------------
>
>                 Key: PDFBOX-468
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-468
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Sean Bridges
>             Fix For: 0.8.0-incubator
>
>         Attachments: patch
>
>
> This is with  svn revision 773978
> I get an index out of bounds exception parsing pdf files, I can't give you the file but the exception is,
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:228)
> 	at message_analyzer.extractor.PDFExtractor.getContent(PDFExtractor.java:32)
> 	... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> 	at org.apache.pdfbox.pdfparser.BaseParser.cmpCircularBuffer(BaseParser.java:398)
> 	at org.apache.pdfbox.pdfparser.BaseParser.readUntilEndStream(BaseParser.java:355)
> 	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:322)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
> 	... 20 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.