You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2016/04/06 14:09:25 UTC

[jira] [Updated] (PDFBOX-3238) Page resources are not inherited from an ancestor node in the page tree

     [ https://issues.apache.org/jira/browse/PDFBOX-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-3238:
------------------------------------
    Affects Version/s: 2.1.0

> Page resources are not inherited from an ancestor node in the page tree
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-3238
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3238
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.11, 2.0.0, 2.1.0
>         Environment: Found on Windows 7 x64, JRE from 1.5 to 8
>            Reporter: Evgeny Chesnokov
>         Attachments: Welding Fixture Model.dwg.pdf
>
>
> Attached is a sample file with a single image on the 1st page in it. When I append the 1st page of a loaded document to a new document, the new document does not have an image in it (displayed as a blank page; Acrobat Reader says the file is broken).
> Steps to reproduce:
> 1. load an attached PDF file using PdfBox (checked versions 1.8.11 and 2.0.0-RC2, tried both {{#load()}} and {{#loadNonSeq()}})
> 2. create a new document
> 3. add a page from a loaded document to a new document
> 4. save a document to a new file.
> Expected: a new PDF file gets created, when opened, it contains an image on the 1st page.
> Actual behaviour: a new PDF file gets created, when opened, the 1st page is empty and Acrobat Reader reports an error ("An error exists on this page. Acrobat may not display the page correctly.").
> Code to reproduce the issue for version 1.8.11:
> {code}
>         PDDocument source = PDDocument.load(new File("Welding Fixture Model.dwg.pdf"));
>         PDPage page = (PDPage) source.getDocumentCatalog().getAllPages().get(0);
>         
>         PDDocument destination = new PDDocument();
>         destination.addPage(page);
>         destination.save("Welding Fixture Model.dwg.page0.pdf");
>         destination.close();
> {code}
> ==========
> Research summary: I've decoded the attached PDF using {{qpdf}} utility and  investigated its structure. Basically, there's no {{/Resources}} declaration in a {{/Page}} object, so it should get inherited from a {{/Pages}} object. Instead it is replaced with an empty resources object, so when saved, it does not have an image in it.
> Research details:
> Below are pieces of a decoded structure of the attached PDF.
> *Pages list declaration:*
> {noformat}
> 3 0 obj
> <<
>   /Count 1
>   /Kids [
>     4 0 R
>   ]
>   /Resources 5 0 R
>   /Type /Pages
> >>
> endobj
> {noformat}
> Explanation:
>  - {{/Type /Pages}} says this object is a list of pages;
>  - {{/Kids}} is an array of references to the individual page objects. In this case, object #4 is the only page in a document;
>  - {{/Resources 5 0 R}} stores a reference to a single resource that is used by the {{/Pages}} object. This is object #5, an image.
> *1st page declaration:*
> {noformat}
> 4 0 obj
> <<
>   /Contents 6 0 R
>   /MediaBox [
>     0
>     0
>     1984
>     2551
>   ]
>   /Parent 3 0 R
>   /Type /Page
> >>
> endobj
> {noformat}
> Explanation:
>  - {{/Type /Page}} says it's a page (duh);
>  - {{/Contents 6 0 R}} references an object #6 that is used to render the content of the page (I won't provide it but it uses the image object #5 mentioned above);
>  - {{/Parent 3 0 R}} is a reference to a {{/Pages}} object described above.
> An important thing here is that this object does not have a {{/Resources}} section of its own. In this case, PDF spec says:
> bq. (Required; inheritable) A dictionary containing any resources required by the page (see 7.8.3, "Resource Dictionaries"). If the page requires no resources, the value of this entry shall be an empty dictionary. *Omitting the entry entirely indicates that the resources shall be inherited from an ancestor node in the page tree*.
> This last sentence means that Page 1 has the same list of resources as its parent /Pages object, and this is where PdfBox misbehaves. When exporting a page with no {{/Resources}} tag, it uses an **EMPTY** list of resources instead of an inherited one.
> To verify this, I've added {{/Resources 5 0 R}} line to the sample PDF 1st page declaration:
> {noformat}
> 4 0 obj
> <<
>   /Contents 6 0 R
>   /MediaBox [
>     0
>     0
>     1984
>     2551
>   ]
>   /Parent 3 0 R
>   /Resources 5 0 R
>   /Type /Page
> >>
> endobj
> {noformat}
> After I did this, PdfBox successfully extracted the 1st page of this document and it correctly displayed an image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org