You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/07/05 11:18:00 UTC

[jira] [Comment Edited] (PDFBOX-4908) PDFMergerUtility.mergeInto() does not deep copy metadata

    [ https://issues.apache.org/jira/browse/PDFBOX-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151546#comment-17151546 ] 

Tilman Hausherr edited comment on PDFBOX-4908 at 7/5/20, 11:17 AM:
-------------------------------------------------------------------

I tried a change (skip arrays and dictionaries) and it works, but then I looked in the PDF specification and it could be that one of these dictionaries (ViewerPreferences) contains an array legally. So maybe just skip dictionaries. But then I wonder, why are these people putting stuff there? Should this weird extra data be kept, or just dumped?


was (Author: tilman):
I tried a change (skip arrays and dictionaries) and it works, but then I looked in the PDF specification and it could be that one of these dictionaries contains an array. So maybe just skip dictionaries. But then I wonder, why are these people putting stuff there? Should this weird extra data be kept, or just dumped?

> PDFMergerUtility.mergeInto() does not deep copy metadata
> --------------------------------------------------------
>
>                 Key: PDFBOX-4908
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4908
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.18, 2.0.20
>         Environment: Windows, JDK12
>            Reporter: Tim Shaffer
>            Priority: Minor
>         Attachments: bad1.pdf, bad2.pdf, blank.pdf
>
>
> After merging two documents, closing the source document prevents the destination document from being saved.
> {code:java}
> // mainDoc can be any existing PDF
> PDDocument mainDoc = PDDocument.load(new File("blank.pdf"));
> PDDocument appendDoc = PDDocument.load(new File("bad1.pdf"));
> //PDDocument appendDoc = PDDocument.load(new File("bad2.pdf"));
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.appendDocument(mainDoc, appendDoc);
> appendDoc.close();
> // Exception thrown during save()
> mainDoc.save("temp.pdf");
> mainDoc.close();
> {code}
> Exception:
> {noformat}
> java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
> 	at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
> 	at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1219)
> 	at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:404)
> 	at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:526)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:464)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:448)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1113)
> 	at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:449)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1386)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1273)
> 	at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1357)
> 	at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1328)
> 	at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1316)
> 	at Main.main(Main.java:60)
> {noformat}
> Attached are two different PDFs, from different sources, that both cause the bug.  All sensitive data has been removed, so the PDFs only contain blank pages, but the structure is still present which causes the above Exception.  Also attached is blank.pdf (another blank doc) that I've been testing with as the destination.
> The cause seems to be these lines in PDFMergerUtility:
> {code:java}
>  PDDocumentInformation destInfo = destination.getDocumentInformation();
>  PDDocumentInformation srcInfo = source.getDocumentInformation();
>  mergeInto(srcInfo.getCOSObject(), destInfo.getCOSObject(), Collections.<COSName>emptySet());
> {code}
> I've tried altering the code to use PDFCloneUtility to clone the srcInfo.getCOSObject() before passing it to mergeInto().  That seems to fix the issue, but I'm not familiar enough with the code to say if that is the correct way to fix this.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org