You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Roberto Nibali <rn...@gmail.com> on 2015/07/06 17:57:39 UTC

Re: Migrate form field entries from one pdf to another

Hi

Sorry for the late reply, I was travelling for business.

> I did a quick test with a newly created form using Adobe Acrobat and
> setting the checkbox also with Acrobat. There the value is not null when
> the checkbox has been checked.
> >
> > I have attached now PDFs, where with my tool the value is null.
>
>
> unfortunately the attachments didn't make it through the mailing list.
> Could you upload them to a public location?
>

It took me a while, before I realized that google offers the technology I
need:

https://drive.google.com/file/d/0B7Bzk_1dcyc5SmRpQUJPR3JGUkk/view?usp=sharing
https://drive.google.com/file/d/0B7Bzk_1dcyc5Tk1qcVo2Yk02dTA/view?usp=sharing

Those are the files I tried to send to this mailing list earlier.


> >
> > > How could I deal with this? Because this is exactly what seems to fail
> and
> > > also cause this dreaded exception message when trying to fill out the
> forms
> > > with anything other than PDTextbox.
> >
> > Without looking at the form:
> >
> > a) test if getValue returns null if not take that value
> > b) if it returns null test if the box has been checked - if yes take
> that value.
> >
> > Which value?
> >
> > use the value retrieved from a) or b) to set the fields value in the pdf
> template.
> >
> > I'm not sure which value you mean.
> >
> > What would be helpful is either a screenshot of the form fields entries
> using the PDFDebugger [
> http://pdfbox.apache.org/1.8/commandline.html#pdfDebugger <
> http://pdfbox.apache.org/1.8/commandline.html#pdfDebugger>] or the
> printout of the fields getDictionary() method so there is some more
> information about how the field definition looks lie. Best would be to have
> the form of course.
>
>
I haven't found the pdfDebugger tool yet; reckon I need to compile it
myself. Nevertheless, when I parse through the structure myself, I do not
get any dictionary entries:

DEBUG: Opening ./Test.pdf
No XFA data in stream
DEBUG: Checkbox [01.20.Entry1]:  On=1 Off=Off Checked=true Value=1
DEBUG: Checkbox [01.20.Entry2]:  On=1 Off=Off Checked=true Value=1
DEBUG: Checkbox [01.20.Entry3]:  On=1 Off=Off Checked=false Value=_n/a_
DEBUG: Checkbox [01.20.Entry4]:  On=2 Off=Off Checked=false Value=_n/a_
DEBUG: TextButton [01.011.Name]: Value=sddsds
DEBUG: TextButton [01.011.Prename]: Value=sdsdsd
DEBUG: Checkbox [01.011.Language]:  On=0 Off=Off Checked=false Value=_n/a_
DEBUG: Checkbox [01.011.Boxes]:  On=Mrs Off=Off Checked=true Value=_n/a_
DEBUG: Opening ./TestTemplate.pdf
No XFA data in stream
Setting CheckBox field: 01.011.Boxes to value: null
Dumping Checkbox field dictionary [01.011.Boxes] ----------
COSDictionary{}
----------------------------------------------------------------------------
Setting CheckBox field: 01.20.Entry4 to value: null
Dumping Checkbox field dictionary [01.20.Entry4] ----------
COSDictionary{}
----------------------------------------------------------------------------
Setting CheckBox field: 01.20.Entry3 to value: null
Dumping Checkbox field dictionary [01.20.Entry3] ----------
COSDictionary{}
----------------------------------------------------------------------------
Setting CheckBox field: 01.20.Entry2 to value: null
Dumping Checkbox field dictionary [01.20.Entry2] ----------
COSDictionary{}
----------------------------------------------------------------------------
Setting CheckBox field: 01.011.Language to value: null
Dumping Checkbox field dictionary [01.011.Language] ----------
COSDictionary{}
----------------------------------------------------------------------------
Setting CheckBox field: 01.20.Entry1 to value: null
Dumping Checkbox field dictionary [01.20.Entry1] ----------
COSDictionary{}

The relevant DEBUG code:

private void analyseAndPrintFields(PDField field) throws IOException {
    String fqName = field.getFullyQualifiedName();
    String value = (field.getValue() != null ? field.getValue() : "_n/a_");

    if (field instanceof PDCheckbox) {
        PDCheckbox checkbox = (PDCheckbox) field;
        logerr("DEBUG: Checkbox [" + fqName + "]:  On=" +
checkbox.getOnValue() +
                " Off=" + checkbox.getOffValue() +
                " Checked=" + (checkbox.isChecked() ? "true" : "false") +
                " Value=" + value);
        //TODO: Check if widgets handling is necessary: checkbox.getWidget();
    } else if (field instanceof PDRadioCollection) {
        PDRadioCollection collection = (PDRadioCollection) field;
        logerr("DEBUG: RadioButtons [" + fqName + "]: " +
                "CollectionValue=" + collection.getValue() +
                " Value=" + value);
    } else if (field instanceof PDPushButton) {
        PDPushButton button = (PDPushButton) field;
        logerr("DEBUG: Pushbuttons [" + fqName + "]: " +
                        "Export/Readonly/Required=" +
                        button.isNoExport() + "/" +
                        button.isReadonly() + "/" +
                        button.isRequired() +
                        " Value=" + value
        );
    } else if (field instanceof PDTextbox) {
        logerr("DEBUG: TextButton [" + fqName + "]: " +
                "Value=" + value);
    } else {
        logerr("DEBUG: Unhandled [" + fqName + "]: " +
                "Type=" + field.getClass().toString());
    }
}



And the dumping code:

    private void setFieldDC(PDDocument pdfDocument, String keyEntry,
PDField oldField) throws Exception {
        PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
        PDAcroForm pdAcroForm = docCatalog.getAcroForm();
        //TODO: Check if this makes sense: pdAcroForm.setCacheFields(true);
        PDField field = pdAcroForm.getField(keyEntry);

        if (field == null) {
            logerr("No field found with name: " + keyEntry);
            return;
        }

        String fieldValue;
        if (oldField instanceof PDTextbox) {
            fieldValue = oldField.getValue();
            if (fieldValue != null) {
                logmsg("Setting field: " + keyEntry + " to value: " +
fieldValue);
                field.setValue(fieldValue);
                if (setFieldFlags) {
                    field.setFieldFlags(oldField.getFieldFlags());
                }
            }
        } else if (oldField instanceof PDCheckbox) {
            fieldValue = oldField.getValue();
            logmsg("Setting CheckBox field: " + keyEntry + " to value:
" + fieldValue);
            if (fieldValue != null) {
                logmsg("Setting field: " + keyEntry + " to value: " +
fieldValue);
                field.setValue(fieldValue);
                if (setFieldFlags) {
                    field.setFieldFlags(oldField.getFieldFlags());
                }
            } else {
                logerr("Dumping Checkbox field dictionary [" +
keyEntry + "] ----------");
                logerr(oldField.getDictionary().toString());

logerr("----------------------------------------------------------------------------");
            }

/*            PDCheckbox oldCheckBox = (PDCheckbox) oldField;
            PDCheckbox newCheckBox = (PDCheckbox) field;

            if (oldCheckBox == null) {
                logerr("oldCheckBox is NULL");
            } else if (newCheckBox == null) {
                logerr("newCheckBox is NULL");
            }

            if (oldCheckBox.isChecked()) {
                logerr("DEBUG: >>>>> PDCheckBox [" + keyEntry + "]
wasChecked = YES");
                newCheckBox.check();
            } else {
                logerr("DEBUG: >>>>> PDCheckBox [" + keyEntry + "]
wasChecked = NO");
                newCheckBox.unCheck();
            }*/
        } else if (oldField instanceof PDChoiceField) {
            fieldValue = oldField.getValue();
            if (fieldValue != null) {
                field.setValue(fieldValue);
                if (setFieldFlags) {
                    field.setFieldFlags(oldField.getFieldFlags());
                }
            } else {
                logerr("Dumping PDChoiceField field dictionary [" +
keyEntry + "] ----------");
                logerr(oldField.getDictionary().toString());

logerr("----------------------------------------------------------------------------");
            }
        } else if (oldField instanceof PDRadioCollection) {
            fieldValue = oldField.getValue();
            if (fieldValue != null) {
                field.setValue(fieldValue);
                if (setFieldFlags) {
                    field.setFieldFlags(oldField.getFieldFlags());
                }
            } else {
                logerr("Dumping PDRadioCollection field dictionary ["
+ keyEntry + "] ----------");
                logerr(oldField.getDictionary().toString());

logerr("----------------------------------------------------------------------------");
            }
        } else if (oldField instanceof PDPushButton) {
            fieldValue = oldField.getValue();
            if (fieldValue != null) {
                field.setValue(fieldValue);
                if (setFieldFlags) {
                    field.setFieldFlags(oldField.getFieldFlags());
                }
            } else {
                logerr("Dumping PDPushButton field dictionary [" +
keyEntry + "] ----------");
                logerr(oldField.getDictionary().toString());

logerr("----------------------------------------------------------------------------");
            }
        }  else {
            logerr("Fields of type [" + oldField.getClass().toString()
+ "] are unsupported");
        }
    }


This is highly confusing. Why can Acrobat deal with those checkboxes when
their value is null and why can't PDFBox set Checkbox values?

How can I simply clone all static PDF form entries of a PDF into a new PDF?
Is PDF really that complex that such a simple thing is not possible? Right
now, only text form entries are copied, the rest shows null for getValue().

Cheers
Roberto

Re: Migrate form field entries from one pdf to another

Posted by Roberto Nibali <rn...@gmail.com>.
Hi Maruan


>
> > This is highly confusing. Why can Acrobat deal with those checkboxes when
> > their value is null and why can't PDFBox set Checkbox values?
> >
> > How can I simply clone all static PDF form entries of a PDF into a new
> PDF?
> > Is PDF really that complex that such a simple thing is not possible?
> Right
> > now, only text form entries are copied, the rest shows null for
> getValue().
>
> the reason that getValue() returns null is that there is no value entry
> set for the filled out form field (this is held in the field dictionaries
> /V entry). But isChecked() returns true as the checkbox has been checked.
> This is bases on the appearance state of the checkbox.
>

I see; slowly I'm seeing the gist here. PDF truly is a tricky format and it
hides it so well from the everyday users through the "Acrobat" tools.


> To give you a quick explanation of that. When a form field is filled out
> the value of the form field has to be filled. But that won't give you any
> visual information. To add the visual information the form field has a
> annotation assigned to it which will have whats's called an appearance. The
> appearance is what's visible on screen or when the pdf is being printed.
>

Understood, albeit from a first notion point of view, this seems an overly
complex architecture. I'm sure there must be reasons for this. Thanks to
your explanantion I finally start to see the bigger picture.


> Normally an application set the value AND the appearance when the form
> field is filled. In you case the form filling application hasn't set the
> field value (that's why getValue() return null) and ONLY updated the
> appearance.
>

One of the applications used is the notoriously bad choice of InDesign to
create form fields, the test PDFs I created using the Adobe Acrobat Pro
tool for Mac, which I downloaded for a one month evaluation period. I used
the original PDFs and stripped out everything that would otherwise have
identified the origins of the PDF and removed all entries but a few test
form fields. Then I exchanged the partial fonts for the fields with some
available ones (I believe it was Garamond). Reading through Tilman's
replies, I learned that this also lead to issues with regard to font
handling.


> So to transfer the value from you original form to the new template you
> have to
>
> a) see if getValue() return anything but null. If that is the case use
> setValue() with the value provided by getValue() to fill out the
> corresponding field in your template
> b) if getValue() is null check using isChecked() if the checkbox has been
> checked. If this is the case use check() to check the checkbox
>

I thought that that's what I did after your last suggestion (where you
wrote exactly those two lines as well), however I have the distinct feeling
that I did something else wrong. Tilman Hausherr kindly provided me with
some test code that seems to work for the test PDF cases I provided. I have
already spotted one basic mistake in my code after quickly glancing over
his. The notion to clone fields from one PDF to another seems to involve
the instantiation of a new PDField object in the template PDF. I had
assumed that assigning the values of the fields read from the originating
PDF to the template PDF would be enough. Never would it have occurred to me
that one needs to instantiate a new PDField object.

Anyway, I'll rewrite my code again to incorporate all this new knowledge
and update to a SNAPSHOT version of PDFBox. Unfortunately, I have no idea
how to use automatic references in Maven so the newest SVN trunk state is
checked out and a JAR is generated as a reference. If the project were done
using git, one could use the https://jitpack.io/ add-on for Maven.

In fact, the canonical source SCM is SVN at apache.org:
https://svn.apache.org/repos/asf/pdfbox/
There is a copy/sync at github: https://github.com/apache/pdfbox, however
it only syncs the old 1.8.9 tree, not the current 2.0.0 snapshot tree.

I suppose that using the latest SNAPSHOT of PDFBox and all dependencies
should suffice for my test case.

We have done some changes to how checkboxes and radio buttons are handled
> in PDFBox 2.0 within the last dates (to make it easier to work with them)
> so please use the latests snapshot version of PDFBox.
>
> There will be an issue with the test template when you set the Name and
> Prename field as the field definition is incomplete (the font resource is
> missing) which will lead to an exception
>
> java.io.IOException: Could not find font: /Courier
>

That's because I probably just didn't know what I was doing when stripping
down the original PDF to provide you guys with a test case. I will
certainly try the new code with the real PDFs and report back as soon as
things progressed.


> The easiest would be to correct the template. If that's not possible we
> could help you building a short workaround. But as the template you
> provided was only a quick mock up and not the real one the final template
> might not have the issue.
>
> If you need further assistance please let us know.
>

Thanks so much!!!

Best regards

Roberto

Re: Migrate form field entries from one pdf to another

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.07.2015 um 13:20 schrieb Roberto Nibali:
> Hi
>
> On Mon, Jul 6, 2015 at 10:14 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Am 06.07.2015 um 18:15 schrieb Maruan Sahyoun:
>>
>>> There will be an issue with the test template when you set the Name and
>>> Prename field as the field definition is incomplete (the font resource is
>>> missing) which will lead to an exception
>>>
>>> java.io.IOException: Could not find font: /Courier
>>>
>>> The easiest would be to correct the template. If that's not possible we
>>> could help you building a short workaround. But as the template you
>>> provided was only a quick mock up and not the real one the final template
>>> might not have the issue.
>>>
> I have managed to write a TestNG class using the newest
> PDFBox-2.0.0-SNAPSHOT jar und include your source code. Now, I'm getting
> this error above as well. So far so good!
>
>
>>   I just tried a quick and dirty solution, I changed
>> PDAppearanceString.getFont(), the "if (font == null)" segment is new:
>>
>>      public PDFont getFont() throws IOException
>>      {
>>          COSName name = getFontResourceName();
>>          PDFont font = defaultResources.getFont(name);
>>
>>          if (font == null)
>>          {
>>              if ("Courier".equals(name.getName()))
>>              {
>>                  COSDictionary dict = new COSDictionary();
>>                  dict.setName(COSName.BASE_FONT, "Courier");
>>                  dict.setName(COSName.NAME, "Courier");
>>                  dict.setName(COSName.SUBTYPE, "Type1");
>>                  dict.setName(COSName.TYPE, "Font");
>>
>>                  font = PDFontFactory.createFont(dict);
>>              }
>>          }
>>
>>          // todo: handle cases where font == null with special mapping
>> logic (see PDFBOX-2661)
>>          if (font == null)
>>          {
>>              throw new IOException("Could not find font: /" +
>> name.getName());
>>          }
>>
>>          return font;
>>      }
>>
> Where would I need to do this? In my code, it won't work, since
> getFontResourceName() and defaultResources.getFont() or not known.
> @Override also does not seem to work. With this I'm stuck at the moment.

This was meant to be a change in PDFBox itself, I assumed you were 
building from source.

But in the meantime you wrote me that you don't get the font problem 
with your production PDF, so the issue is moot.

Tilman

>
>
>> Now I was able to set the text fields
>> (Roberto: in the file I sent to you earlier, uncomment
>> "newTextField.setValue(textField.getValue());")
>>
> I did, however this prompts me with the above error, since I can't find a
> place where to put the getFont() code. When commented, I can reproduce your
> results and the resulting PDF does have the checkboxes set!!!! :)
>
> Will keep trying ...
>
> Thanks for the tremendous help.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Migrate form field entries from one pdf to another

Posted by Roberto Nibali <rn...@gmail.com>.
Hi

On Mon, Jul 6, 2015 at 10:14 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> Am 06.07.2015 um 18:15 schrieb Maruan Sahyoun:
>
>> There will be an issue with the test template when you set the Name and
>> Prename field as the field definition is incomplete (the font resource is
>> missing) which will lead to an exception
>>
>> java.io.IOException: Could not find font: /Courier
>>
>> The easiest would be to correct the template. If that's not possible we
>> could help you building a short workaround. But as the template you
>> provided was only a quick mock up and not the real one the final template
>> might not have the issue.
>>
>

I have managed to write a TestNG class using the newest
PDFBox-2.0.0-SNAPSHOT jar und include your source code. Now, I'm getting
this error above as well. So far so good!


>  I just tried a quick and dirty solution, I changed
> PDAppearanceString.getFont(), the "if (font == null)" segment is new:
>
>     public PDFont getFont() throws IOException
>     {
>         COSName name = getFontResourceName();
>         PDFont font = defaultResources.getFont(name);
>
>         if (font == null)
>         {
>             if ("Courier".equals(name.getName()))
>             {
>                 COSDictionary dict = new COSDictionary();
>                 dict.setName(COSName.BASE_FONT, "Courier");
>                 dict.setName(COSName.NAME, "Courier");
>                 dict.setName(COSName.SUBTYPE, "Type1");
>                 dict.setName(COSName.TYPE, "Font");
>
>                 font = PDFontFactory.createFont(dict);
>             }
>         }
>
>         // todo: handle cases where font == null with special mapping
> logic (see PDFBOX-2661)
>         if (font == null)
>         {
>             throw new IOException("Could not find font: /" +
> name.getName());
>         }
>
>         return font;
>     }
>

Where would I need to do this? In my code, it won't work, since
getFontResourceName() and defaultResources.getFont() or not known.
@Override also does not seem to work. With this I'm stuck at the moment.


> Now I was able to set the text fields
> (Roberto: in the file I sent to you earlier, uncomment
> "newTextField.setValue(textField.getValue());")
>

I did, however this prompts me with the above error, since I can't find a
place where to put the getFont() code. When commented, I can reproduce your
results and the resulting PDF does have the checkboxes set!!!! :)

Will keep trying ...

Thanks for the tremendous help.

Re: Migrate form field entries from one pdf to another

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 06.07.2015 um 18:15 schrieb Maruan Sahyoun:
> There will be an issue with the test template when you set the Name and Prename field as the field definition is incomplete (the font resource is missing) which will lead to an exception
>
> java.io.IOException: Could not find font: /Courier
>
> The easiest would be to correct the template. If that's not possible we could help you building a short workaround. But as the template you provided was only a quick mock up and not the real one the final template might not have the issue.

I just tried a quick and dirty solution, I changed 
PDAppearanceString.getFont(), the "if (font == null)" segment is new:

     public PDFont getFont() throws IOException
     {
         COSName name = getFontResourceName();
         PDFont font = defaultResources.getFont(name);

         if (font == null)
         {
             if ("Courier".equals(name.getName()))
             {
                 COSDictionary dict = new COSDictionary();
                 dict.setName(COSName.BASE_FONT, "Courier");
                 dict.setName(COSName.NAME, "Courier");
                 dict.setName(COSName.SUBTYPE, "Type1");
                 dict.setName(COSName.TYPE, "Font");

                 font = PDFontFactory.createFont(dict);
             }
         }

         // todo: handle cases where font == null with special mapping 
logic (see PDFBOX-2661)
         if (font == null)
         {
             throw new IOException("Could not find font: /" + 
name.getName());
         }

         return font;
     }


Now I was able to set the text fields
(Roberto: in the file I sent to you earlier, uncomment 
"newTextField.setValue(textField.getValue());")

However the resources have two identical fonts now, /F2 and /Courier.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Migrate form field entries from one pdf to another

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

<snip>
….
</snip>

> 
> This is highly confusing. Why can Acrobat deal with those checkboxes when
> their value is null and why can't PDFBox set Checkbox values?
> 
> How can I simply clone all static PDF form entries of a PDF into a new PDF?
> Is PDF really that complex that such a simple thing is not possible? Right
> now, only text form entries are copied, the rest shows null for getValue().

the reason that getValue() returns null is that there is no value entry set for the filled out form field (this is held in the field dictionaries /V entry). But isChecked() returns true as the checkbox has been checked. This is bases on the appearance state of the checkbox.

To give you a quick explanation of that. When a form field is filled out the value of the form field has to be filled. But that won't give you any visual information. To add the visual information the form field has a annotation assigned to it which will have whats's called an appearance. The appearance is what's visible on screen or when the pdf is being printed.

Normally an application set the value AND the appearance when the form field is filled. In you case the form filling application hasn't set the field value (that's why getValue() return null) and ONLY updated the appearance.

So to transfer the value from you original form to the new template you have to 

a) see if getValue() return anything but null. If that is the case use setValue() with the value provided by getValue() to fill out the corresponding field in your template
b) if getValue() is null check using isChecked() if the checkbox has been checked. If this is the case use check() to check the checkbox

We have done some changes to how checkboxes and radio buttons are handled in PDFBox 2.0 within the last dates (to make it easier to work with them) so please use the latests snapshot version of PDFBox.

There will be an issue with the test template when you set the Name and Prename field as the field definition is incomplete (the font resource is missing) which will lead to an exception

java.io.IOException: Could not find font: /Courier

The easiest would be to correct the template. If that's not possible we could help you building a short workaround. But as the template you provided was only a quick mock up and not the real one the final template might not have the issue.

If you need further assistance please let us know.

BR
Maruan



> 
> Cheers
> Roberto


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org