You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Dagnon, William" <Wi...@wpsic.com> on 2018/07/24 22:29:54 UTC

PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Hello all, newb here;

I have a source PDF file, and I need to build a new file from it by selectively copying/not copying/duplicating pages, and filling out annotated fields in the new file.  Eg. sometimes the new file will have 6 pages, sometimes it will have 10 pages.

So far, my searches haven't turned up people asking questions about quite this situation:

1. one or more of the pages are forms, eg. have annotated fields and need to fill them in.
2. one of the form pages needs to get repeated in certain circumstances - mine is a table-of-inputs page with a header.  May need zero, may need 1 or more copies filled out.

I don't have a deep understanding of how the PDFBox operates, so I tried my simplistic version from the Javadocs and some Google-fu:
pddoc.importPage(originalDoc.getPage(0));
cat = pddoc.getDocumentCatalog();
acroForm = cat.getAcroForm();// returns null
field = acroForm.getField("name");// blows up with an NPE

Meaning there is no PDAcroForm in the new PDDocument after I copy the first page, which has form fields.  At least no cached version, according to my debugger.  How do I get it to generate an appropriate AcroForm?  Or do I need to wait for an AcroForm until all pages, or at least all field-containing pages, are in my pddoc before doing that?

My secondary question addresses #2 above: how can I have 2 copies of the same page (form), and annotated elements, in one PDDocument?  Does the 2nd copy of the PDPage need to rename all fields so all annotation names remain unique?  Or can I operate on a copy of the PDPage and then import it into the final pddoc (assuming once the page has values, names won't interfere with existing names in the same PDDocument)?

Websites, example code, deeper knowledge all appreciated!


P.S. and thanks to daedtech.com for getting me started!

CONFIDENTIALITY NOTICE: This e-mail, including any attachments, may contain confidential, privileged and/or proprietary information which is solely for the use of the intended recipient(s). Any review, use, disclosure, or retention by others is strictly prohibited. If you are not an intended recipient, please contact the sender and delete this e-mail, any attachments, and all copies.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Posted by Gary Grosso <ga...@oberontech.com>.
IMO the PDFBox forum archives are hard to search.

Have you already seen the examples at https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/ ?

I'll admit I didn't study your question in detail. It's 8pm here and I'm not yet done for the day. But I saw you ask for examples...

Gary



-----Original Message-----
From: Dagnon, William <Wi...@wpsic.com> 
Sent: Tuesday, July 24, 2018 6:30 PM
To: users@pdfbox.apache.org
Subject: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Hello all, newb here;

I have a source PDF file, and I need to build a new file from it by selectively copying/not copying/duplicating pages, and filling out annotated fields in the new file.  Eg. sometimes the new file will have 6 pages, sometimes it will have 10 pages.

So far, my searches haven't turned up people asking questions about quite this situation:

1. one or more of the pages are forms, eg. have annotated fields and need to fill them in.
2. one of the form pages needs to get repeated in certain circumstances - mine is a table-of-inputs page with a header.  May need zero, may need 1 or more copies filled out.

I don't have a deep understanding of how the PDFBox operates, so I tried my simplistic version from the Javadocs and some Google-fu:
pddoc.importPage(originalDoc.getPage(0));
cat = pddoc.getDocumentCatalog();
acroForm = cat.getAcroForm();// returns null field = acroForm.getField("name");// blows up with an NPE

Meaning there is no PDAcroForm in the new PDDocument after I copy the first page, which has form fields.  At least no cached version, according to my debugger.  How do I get it to generate an appropriate AcroForm?  Or do I need to wait for an AcroForm until all pages, or at least all field-containing pages, are in my pddoc before doing that?

My secondary question addresses #2 above: how can I have 2 copies of the same page (form), and annotated elements, in one PDDocument?  Does the 2nd copy of the PDPage need to rename all fields so all annotation names remain unique?  Or can I operate on a copy of the PDPage and then import it into the final pddoc (assuming once the page has values, names won't interfere with existing names in the same PDDocument)?

Websites, example code, deeper knowledge all appreciated!


P.S. and thanks to daedtech.com for getting me started!

CONFIDENTIALITY NOTICE: This e-mail, including any attachments, may contain confidential, privileged and/or proprietary information which is solely for the use of the intended recipient(s). Any review, use, disclosure, or retention by others is strictly prohibited. If you are not an intended recipient, please contact the sender and delete this e-mail, any attachments, and all copies.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 25.07.2018 um 14:40 schrieb Dagnon, William:
> Thanks for your answer, Tilman,
>
> That was what I was afraid of: that the PDFBox workflow (or maybe moreso the PDF internal structures) are not documented, especially in the Javadocs ):

That is in the PDF specification. We do also have quite a few examples 
in the source code. There are so many things that can be done with 
PDFBox, so we don't have example for everything.

If you have an idea how to improve the javadocs based on your 
frustrations, don't hesitate to tell. Even small things. It may be too 
late for you, but it might help the next person.

Tilman

>
> However, you did put me on the correct path and I was able to find something which may actually help:
>
> https://stackoverflow.com/questions/29371129/java-pdfbox-fill-out-pdf-form-append-it-to-pddocument-and-repeat
>
> I'll attempt to combine that strategy with PDDocument.removePage(0,2..7) as you suggested, though that seems like a lot of possible file pointers.
>
> Thanks!
>
> -----Original Message-----
> From: Tilman Hausherr <TH...@t-online.de>
> Sent: Tuesday, July 24, 2018 11:39 PM
> To: users@pdfbox.apache.org
> Subject: Re: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?
>
> WARNING: This is an external email that originated outside of the WPS email system.
> DO NOT CLICK links or open attachments unless you recognize the sender and know the content is safe!
>
> Hi,
>
> The pages have only the widget annotations. The form fields are related to the document catalog. Each field has one or more widgets. So it's more tricky. The best would be to create a copy of the document and then delete pages.
>
> Tilman
>
> Am 25.07.2018 um 00:29 schrieb Dagnon, William:
>> Hello all, newb here;
>>
>> I have a source PDF file, and I need to build a new file from it by selectively copying/not copying/duplicating pages, and filling out annotated fields in the new file.  Eg. sometimes the new file will have 6 pages, sometimes it will have 10 pages.
>>
>> So far, my searches haven't turned up people asking questions about quite this situation:
>>
>> 1. one or more of the pages are forms, eg. have annotated fields and need to fill them in.
>> 2. one of the form pages needs to get repeated in certain circumstances - mine is a table-of-inputs page with a header.  May need zero, may need 1 or more copies filled out.
>>
>> I don't have a deep understanding of how the PDFBox operates, so I tried my simplistic version from the Javadocs and some Google-fu:
>> pddoc.importPage(originalDoc.getPage(0));
>> cat = pddoc.getDocumentCatalog();
>> acroForm = cat.getAcroForm();// returns null field =
>> acroForm.getField("name");// blows up with an NPE
>>
>> Meaning there is no PDAcroForm in the new PDDocument after I copy the first page, which has form fields.  At least no cached version, according to my debugger.  How do I get it to generate an appropriate AcroForm?  Or do I need to wait for an AcroForm until all pages, or at least all field-containing pages, are in my pddoc before doing that?
>>
>> My secondary question addresses #2 above: how can I have 2 copies of the same page (form), and annotated elements, in one PDDocument?  Does the 2nd copy of the PDPage need to rename all fields so all annotation names remain unique?  Or can I operate on a copy of the PDPage and then import it into the final pddoc (assuming once the page has values, names won't interfere with existing names in the same PDDocument)?
>>
>> Websites, example code, deeper knowledge all appreciated!
>>
>>
>> P.S. and thanks to daedtech.com for getting me started!
>>
>> CONFIDENTIALITY NOTICE: This e-mail, including any attachments, may contain confidential, privileged and/or proprietary information which is solely for the use of the intended recipient(s). Any review, use, disclosure, or retention by others is strictly prohibited. If you are not an intended recipient, please contact the sender and delete this e-mail, any attachments, and all copies.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Posted by "Dagnon, William" <Wi...@wpsic.com>.
Thanks for your answer, Tilman,

That was what I was afraid of: that the PDFBox workflow (or maybe moreso the PDF internal structures) are not documented, especially in the Javadocs ):

However, you did put me on the correct path and I was able to find something which may actually help:

https://stackoverflow.com/questions/29371129/java-pdfbox-fill-out-pdf-form-append-it-to-pddocument-and-repeat

I'll attempt to combine that strategy with PDDocument.removePage(0,2..7) as you suggested, though that seems like a lot of possible file pointers.

Thanks!

-----Original Message-----
From: Tilman Hausherr <TH...@t-online.de> 
Sent: Tuesday, July 24, 2018 11:39 PM
To: users@pdfbox.apache.org
Subject: Re: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

WARNING: This is an external email that originated outside of the WPS email system.
DO NOT CLICK links or open attachments unless you recognize the sender and know the content is safe!

Hi,

The pages have only the widget annotations. The form fields are related to the document catalog. Each field has one or more widgets. So it's more tricky. The best would be to create a copy of the document and then delete pages.

Tilman

Am 25.07.2018 um 00:29 schrieb Dagnon, William:
> Hello all, newb here;
>
> I have a source PDF file, and I need to build a new file from it by selectively copying/not copying/duplicating pages, and filling out annotated fields in the new file.  Eg. sometimes the new file will have 6 pages, sometimes it will have 10 pages.
>
> So far, my searches haven't turned up people asking questions about quite this situation:
>
> 1. one or more of the pages are forms, eg. have annotated fields and need to fill them in.
> 2. one of the form pages needs to get repeated in certain circumstances - mine is a table-of-inputs page with a header.  May need zero, may need 1 or more copies filled out.
>
> I don't have a deep understanding of how the PDFBox operates, so I tried my simplistic version from the Javadocs and some Google-fu:
> pddoc.importPage(originalDoc.getPage(0));
> cat = pddoc.getDocumentCatalog();
> acroForm = cat.getAcroForm();// returns null field = 
> acroForm.getField("name");// blows up with an NPE
>
> Meaning there is no PDAcroForm in the new PDDocument after I copy the first page, which has form fields.  At least no cached version, according to my debugger.  How do I get it to generate an appropriate AcroForm?  Or do I need to wait for an AcroForm until all pages, or at least all field-containing pages, are in my pddoc before doing that?
>
> My secondary question addresses #2 above: how can I have 2 copies of the same page (form), and annotated elements, in one PDDocument?  Does the 2nd copy of the PDPage need to rename all fields so all annotation names remain unique?  Or can I operate on a copy of the PDPage and then import it into the final pddoc (assuming once the page has values, names won't interfere with existing names in the same PDDocument)?
>
> Websites, example code, deeper knowledge all appreciated!
>
>
> P.S. and thanks to daedtech.com for getting me started!
>
> CONFIDENTIALITY NOTICE: This e-mail, including any attachments, may contain confidential, privileged and/or proprietary information which is solely for the use of the intended recipient(s). Any review, use, disclosure, or retention by others is strictly prohibited. If you are not an intended recipient, please contact the sender and delete this e-mail, any attachments, and all copies.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox AcroForm: timeline? or how to make multiple copies of a page into one file?

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

The pages have only the widget annotations. The form fields are related 
to the document catalog. Each field has one or more widgets. So it's 
more tricky. The best would be to create a copy of the document and then 
delete pages.

Tilman

Am 25.07.2018 um 00:29 schrieb Dagnon, William:
> Hello all, newb here;
>
> I have a source PDF file, and I need to build a new file from it by selectively copying/not copying/duplicating pages, and filling out annotated fields in the new file.  Eg. sometimes the new file will have 6 pages, sometimes it will have 10 pages.
>
> So far, my searches haven't turned up people asking questions about quite this situation:
>
> 1. one or more of the pages are forms, eg. have annotated fields and need to fill them in.
> 2. one of the form pages needs to get repeated in certain circumstances - mine is a table-of-inputs page with a header.  May need zero, may need 1 or more copies filled out.
>
> I don't have a deep understanding of how the PDFBox operates, so I tried my simplistic version from the Javadocs and some Google-fu:
> pddoc.importPage(originalDoc.getPage(0));
> cat = pddoc.getDocumentCatalog();
> acroForm = cat.getAcroForm();// returns null
> field = acroForm.getField("name");// blows up with an NPE
>
> Meaning there is no PDAcroForm in the new PDDocument after I copy the first page, which has form fields.  At least no cached version, according to my debugger.  How do I get it to generate an appropriate AcroForm?  Or do I need to wait for an AcroForm until all pages, or at least all field-containing pages, are in my pddoc before doing that?
>
> My secondary question addresses #2 above: how can I have 2 copies of the same page (form), and annotated elements, in one PDDocument?  Does the 2nd copy of the PDPage need to rename all fields so all annotation names remain unique?  Or can I operate on a copy of the PDPage and then import it into the final pddoc (assuming once the page has values, names won't interfere with existing names in the same PDDocument)?
>
> Websites, example code, deeper knowledge all appreciated!
>
>
> P.S. and thanks to daedtech.com for getting me started!
>
> CONFIDENTIALITY NOTICE: This e-mail, including any attachments, may contain confidential, privileged and/or proprietary information which is solely for the use of the intended recipient(s). Any review, use, disclosure, or retention by others is strictly prohibited. If you are not an intended recipient, please contact the sender and delete this e-mail, any attachments, and all copies.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org