You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Ankit Inkollu <ai...@gmail.com> on 2018/05/01 16:23:11 UTC

Query in accessing an AcroForm after splitting a PDF

Hi All,

I am facing an issue in accessing an AcroForm after splitting a PDF.


*Scenario:*
In my PDF (contains sensitive data and hence cannot be shared), I have
fillable fields in multiple pages. I need to split the pages based on a
certain value in a certain field. When I try to get the AcroForm object on
the entire document, it works. But, when I split the document into pages
and try to access the AcroForm, it throws a null value.


*Test Steps:*
1. Load a fillable PDF (contains a single page with input field).
2. Split the PDF and get the first page as a PDDocument object.
3. Using the PDDocument object, I get the document catalog object.
4. Using the document catalog object, I
* try to get the acroform but it gives a null value.*


*Code-Snippet:*

*// get the PDF file*
File file = new File(pdfFilePath);

*// load the PDF file to memory*
PDDocument  pdfDocument = PDDocument.load(file);


*// initialize the splitter object and split the PDF object and get the 1st
page as a PDDocument object*
Splitter split = new Splitter();
PDDocument documentPage1 = split.split(document).get(0);

*// get the document catalog object of the splitted page*
PDDocumentCatalog docCatalog = documentPage1.getDocumentCatalog();

*// get the acroform of the document *
 PDAcroForm a = cat.getAcroForm();


*PDFBox Version:*2.0.9


Can we get the acroform for a single page in a PDF document? Let me know if
you have any analysis on this.

Thanks
Ankit

Re: Query in accessing an AcroForm after splitting a PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
I suspect that the acroform isn't copied.

Alternative solution: just delete the pages you don't need and save that.

document.getPages().remove(...)

Only problem left now is that you still have all fields. You should now 
remove the fields that you don't need. You can find out which they are 
by looking at each field, and their widgets, and for each widget check 
what page it is (getPage()). The removeFields method isn't public, and 
it isn't part of 2.0.9 (I think), so here's the source code (you'll also 
find it in the repository, I think there's also an answer on stackoverflow):

     private void removeFields(List<PDField> fields)
     {
         for (PDField field : fields) {
             if (field.getParent() == null)
             {
                 COSArray cosFields = (COSArray) 
dictionary.getDictionaryObject(COSName.FIELDS);
                 for (int i=0; i<cosFields.size(); i++)
                 {
                     COSDictionary element = (COSDictionary) 
cosFields.getObject(i);
                     if (field.getCOSObject().equals(element)) {
                         cosFields.remove(i);
                     }
                 }
             }
             else
             {
                 COSArray kids = (COSArray) 
field.getParent().getCOSObject().getDictionaryObject(COSName.KIDS);
                 for (int i=0; i<kids.size(); i++)
                 {
                     COSDictionary element = (COSDictionary) 
kids.getObject(i);
                     if (field.getCOSObject().equals(element)) {
                         kids.remove(i);
                     }
                 }
             }
         }
     }


"dictionary" is what you get when you call acroform.getCOSObject().

Tilman

Am 01.05.2018 um 18:23 schrieb Ankit Inkollu:
> Hi All,
>
> I am facing an issue in accessing an AcroForm after splitting a PDF.
>
> *Scenario:
> *
> In my PDF (contains sensitive data and hence cannot be shared), I have 
> fillable fields in multiple pages. I need to split the pages based on 
> a certain value in a certain field. When I try to get the AcroForm 
> object on the entire document, it works. But, when I split the 
> document into pages and try to access the AcroForm, it throws a null 
> value.
> *
> *
> *Test Steps:
> *
> 1. Load a fillable PDF (contains a single page with input field).
> 2. Split the PDF and get the first page as a PDDocument object.
> 3. Using the PDDocument object, I get the document catalog object.
> 4. Using the document catalog object, I*try to get the acroform but it 
> gives a null value.
> *
>
> *Code-Snippet:
> *
> *
> *
> *// get the PDF file*
> File file = new File(pdfFilePath);
>
> *// load the PDF file to memory*
> PDDocument  pdfDocument = PDDocument.load(file);*
>
> *
> *// initialize the splitter object and split the PDF object and get 
> the 1st page as a PDDocument object
> *
> Splitter split = new Splitter();
> PDDocument documentPage1 = split.split(document).get(0);
>
> *// get the document catalog object of the splitted page*
> PDDocumentCatalog docCatalog = documentPage1.getDocumentCatalog();
>
> *// get the acroform of the document *
>  PDAcroForm a = cat.getAcroForm();*
> *
>
> *PDFBox Version:
> *2.0.9*
>
> *
> Can we get the acroform for a single page in a PDF document? Let me 
> know if you have any analysis on this.
>
> Thanks
> Ankit
> *
> *
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org