You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Adam Nichols (JIRA)" <ji...@apache.org> on 2010/08/24 21:03:22 UTC

[jira] Created: (PDFBOX-802) Better handle corrupt/missing %%EOF flags at the end of a file

Better handle corrupt/missing %%EOF flags at the end of a file
--------------------------------------------------------------

                 Key: PDFBOX-802
                 URL: https://issues.apache.org/jira/browse/PDFBOX-802
             Project: PDFBox
          Issue Type: Improvement
            Reporter: Adam Nichols
            Assignee: Adam Nichols
             Fix For: 1.3.0


Currently, when the %%EOF flag at the end of the file is missing, an IOException is thrown which produces a stacktrace something like this:
java.io.IOException: Error: Expected to read '%%EOF' instead started reading '%%E^@'
        at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1090)
        at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:463)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:751)

While these PDFs are non-conforming, it'd be an improvement to allow them to be read and processed since we're only a few bytes from the end of file anyway.

There's existing code which checks to see if what was read was %%EOF and throw an exception if %%EOF wasn't read and we're not at the end of file.  However, this is never reached because readExpectedString() throws an exception before this can happen.  To fix this, I changed readExpectedString() to readString() and left the manual check to see if the proper %%EOF flag was found.  If not, it'll output a warning.  If we're not at the end of the file, we'll still throw an exception.  I've seen corrupted and missing %%EOF flags at the end of a file, but never in the middle.  Since this doesn't seem to happen, if it does the PDF is clearly out of spec, and these issues would be much harder to deal with, throwing an exception still seems like a reasonable thing to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-802) Better handle corrupt/missing %%EOF flags at the end of a file

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Nichols resolved PDFBOX-802.
---------------------------------

    Resolution: Fixed

Patch committed in revision 988694

> Better handle corrupt/missing %%EOF flags at the end of a file
> --------------------------------------------------------------
>
>                 Key: PDFBOX-802
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-802
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Adam Nichols
>            Assignee: Adam Nichols
>             Fix For: 1.3.0
>
>
> Currently, when the %%EOF flag at the end of the file is missing, an IOException is thrown which produces a stacktrace something like this:
> java.io.IOException: Error: Expected to read '%%EOF' instead started reading '%%E^@'
>         at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1090)
>         at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:463)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:751)
> While these PDFs are non-conforming, it'd be an improvement to allow them to be read and processed since we're only a few bytes from the end of file anyway.
> There's existing code which checks to see if what was read was %%EOF and throw an exception if %%EOF wasn't read and we're not at the end of file.  However, this is never reached because readExpectedString() throws an exception before this can happen.  To fix this, I changed readExpectedString() to readString() and left the manual check to see if the proper %%EOF flag was found.  If not, it'll output a warning.  If we're not at the end of the file, we'll still throw an exception.  I've seen corrupted and missing %%EOF flags at the end of a file, but never in the middle.  Since this doesn't seem to happen, if it does the PDF is clearly out of spec, and these issues would be much harder to deal with, throwing an exception still seems like a reasonable thing to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.