You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (JIRA)" <ji...@apache.org> on 2018/01/17 10:54:00 UTC

[jira] [Comment Edited] (PDFBOX-4066) Merging documents with nested fields duplicates child fields

    [ https://issues.apache.org/jira/browse/PDFBOX-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328611#comment-16328611 ] 

Michael Klink edited comment on PDFBOX-4066 at 1/17/18 10:53 AM:
-----------------------------------------------------------------

To make things a bit more complicated... ;)

Shouldn't a complete solution in case of a duplicate non-terminal root field check whether the entries other than the child fields are identical and in that case consider these top level fields merged and continue inspecting the child fields? If the child fields have distinct names, then all child can simply be merged.

In case of duplicate non-terminal child fields the same consideration can take place as for the duplicate non-terminal root field.

Even in case of duplicate terminal fields with identical entries other than the widgets one can consider merging them as multiple widgets of the same field. This should be optional, though, as this might not be wanted.

Only in the case of duplicate non-terminal or terminal fields with incompatible entries one of the duplicates needs to be renamed...

Ok, not trivial... ;)

 


was (Author: mkl):
To make things a bit more complicated... ;)

Shouldn't a complete solution in case of a duplicate non-terminal root field check whether the entries other than the child fields are identical and in that case consider these top level fields merged and continue inspecting the child fields? If the child fields have distinct names, then all child can simply be merged.

In case of duplicate non-terminal child fields the same consideration can take place as for the duplicate non-terminal root field.

Only in that case of duplicate non-terminal or terminal fields with incompatible 

> Merging documents with nested fields duplicates child fields
> ------------------------------------------------------------
>
>                 Key: PDFBOX-4066
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4066
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, Utilities
>    Affects Versions: 2.0.8
>            Reporter: Al Phaba
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.9, 3.0.0 PDFBox
>
>         Attachments: TestForm-flattened.pdf, TestForm-merged.pdf, TestForm.pdf, flattenAndMerge.pdf
>
>
> I have a pdf with a lot of acroforms, I do some manipulation on it which results in a new pdf. So I have PDF-1 (which is the original one )and PDF-2 (just a duplication of PDF-1), now I want to merge them. Both PDFs have some acroforms for example: field_a, field_2...
> Before I merge them I flatten PDF-1, because I only want to have acrofields from PDF-2. When I check then my new merged PDF I can see that there are no visible fields on on the pages from PDF-1 and there are fields on pages of fields of PDF-2. At the first look it seems ok, but when I inspect the fields I can see that the merger has renamed all the fields for PDF-2 e.g. field_a_dummy123, field_b_dummy232 ...
> It seems to me, that flattening does not remove the fields and thats why the PDFMerger from PDFBox will rename the fields for PDF-2 because acrofields need to be unique.Another guess was that there is a bug in mergeAcroForm()
>  
> {code:java}
> @Test
> public void flattenAndMerge() throws IOException {
>     File testForm = new File(classLoader.getResource("./TestForm.pdf").getFile());
>     byte[] testFormAsByte = Files.readAllBytes(testForm.toPath());
>     byte[] testFormAsByte2 = Files.readAllBytes(testForm.toPath());
>     PDDocument pdf1 = PDDocument.load(testFormAsByte);
>     PDAcroForm acroform = pdf1.getDocumentCatalog().getAcroForm();
>     acroform.flatten();
>     Path flattendedPdf = Files.createTempFile("flatten", ".pdf");
>     pdf1.save(flattendedPdf.toFile());
>     PDFMergerUtility merger = new PDFMergerUtility();
>     merger.addSource(new ByteArrayInputStream(Files.readAllBytes(flattendedPdf)));
>     merger.addSource(new ByteArrayInputStream(testFormAsByte2));
>     merger.setDestinationFileName("./build/flattenAndMerge.pdf");
>     merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
> }
> {code}
> Here is my SO Article
> [https://stackoverflow.com/questions/48271924/pdfbox-flatten-pdf-does-not-remove-acroform-elements?noredirect=1#comment83544858_48271924]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org