You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2015/03/18 22:02:39 UTC

[jira] [Closed] (PDFBOX-2715) Pages in a PDF being dropped with just an error-log message

     [ https://issues.apache.org/jira/browse/PDFBOX-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-2715.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.8.8
                   2.0.0
         Assignee: Andreas Lehmkühler

Both pdfs are malformed. I had a deeper look at IT-11557_pdf_broken_pages_F150317DYCELZZ.pdf. It contains two doubled objects 16 0 and 17 0 and two objects are missing 36 0 and 53 0. If one compares the offsets of the missing objects within the xref table and the objects of the doubled objects it's obvious that these are the doubled ones. As both missing objects aren't needed it's safe to skip them. So does the non sequential parser as Maruan already pointed out.

Closed as fixed


> Pages in a PDF being dropped with just an error-log message
> -----------------------------------------------------------
>
>                 Key: PDFBOX-2715
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2715
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.8
>         Environment: Linux, Java 7. 
>            Reporter: Cecilie Fritzvold
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0, 1.8.8
>
>         Attachments: IT-11557_pdf_broken_pages.pdf, IT-11557_pdf_broken_pages_F150317DYCELZZ.pdf
>
>
> Trying to excatly pages from PDF documents like this
> {code}
> PDDocument doc = PDDocument.load(new ByteArrayInputStream(pdf));
> List allPages = doc.getDocumentCatalog().getAllPages();
> {code}
> But not all pages get read, and the only indication something is wrong is this error-logging:
> {noformat}
> ERROR org.apache.pdfbox.pdmodel.PDPageNode.getAllKids()#202: No Kids found in getAllKids(). Probably a malformed pdf.
> {noformat}
> I'm getting one of these error-lines for each page that isn't read. I'm attaching two different files with this problem. One gives me 4 out of 6 pages, and the other gives me none of the 4 pages. Both documents read fine in Acrobat Reader and in Okular where all the pages get shown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org