You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/12/16 18:01:10 UTC
[jira] [Updated] (PDFBOX-1811) java.io.IOException: Object at offset does not end with 'endobj'

     [ https://issues.apache.org/jira/browse/PDFBOX-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-1811:
------------------------------------

    Description: 
I get this exception with the file amyuni2_05d__pdf1_3_acro4x.pdf (it was once part of the project, now no more, but it can still be found on the web):
java.io.IOException: Object (48:0) at offset 161333 does not end with 'endobj'.

This is true, the "endobject" is indeed missing in that file. However the content of endObjectKey is 49 0 obj, i.e. the start of a new object.

  was:
I get this exception with the file amyuni2_05d__pdf1_3_acro4x.pdf (it was once part of the project, now no more, but it can still be found on the web):
java.io.IOException: Object (48:0) at offset 161333 does not end with 'endobj'.
    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1312)
    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1159)
    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1133)
    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:470)
    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:731)
    at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1139)
    at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1122)
    at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:134)
    at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:78)

This is true, the "endobject" is indeed missing in that file. However the content of endObjectKey is 49 0 obj, i.e. the start of a new object.

So my suggestion is to change in NonSequentialPDFParser.java the segment at

{code}
if (!endObjectKey.startsWith("endobj"))
{
      throw new IOException("Object (" + readObjNr + ":" + readObjGen + ") at offset "
                    + offsetOrObjstmObNr + " does not end with 'endobj'.");
}
{code}

to
{code}
 if (!endObjectKey.startsWith("endobj"))
 {
     if (endObjectKey.endsWith(" obj"))
         LOG.warn("Object (" + readObjNr + ":" + readObjGen + ") at offset "
             + offsetOrObjstmObNr + " does not end with 'endobj' but with '" + endObjectKey + "'");
     else
         throw new IOException("Object (" + readObjNr + ":" + readObjGen + ") at offset "
             + offsetOrObjstmObNr + " does not end with 'endobj' but with '" + endObjectKey + "'"); }
{code}



> java.io.IOException: Object at offset does not end with 'endobj'
> ----------------------------------------------------------------
>
>                 Key: PDFBOX-1811
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1811
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: XP, W7
>            Reporter: Tilman Hausherr
>
> I get this exception with the file amyuni2_05d__pdf1_3_acro4x.pdf (it was once part of the project, now no more, but it can still be found on the web):
> java.io.IOException: Object (48:0) at offset 161333 does not end with 'endobj'.
> This is true, the "endobject" is indeed missing in that file. However the content of endObjectKey is 49 0 obj, i.e. the start of a new object.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)