You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by "ropo@ropo.de" <ro...@ropo.de> on 2017/03/22 11:52:50 UTC

Writing non-ansi characters into form fields

Hello,
 
I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:
        doc = PDDocument.load(inputStream);
        PDDocumentCatalog catalog = doc.getDocumentCatalog();
        PDAcroForm form = catalog.getAcroForm();
        for (PDField field : form.getFieldTree()){
            field.setValue("должен");
        }
        
I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic: TimesNewRomanPSMT) encoding: StandardEncoding with differences

I can create the PDF documents any way I like. I have tried MS Office export as Adobe PDF and creating directly with Acrobat Pro DC. When creating the fields in Acrobat I can select a font. I tried all kinds of fonts, for "Arial Unicode MS" it wants to download a 50MB "Adobe Acrobat Reader DC Font Pack". The final PDF file with the filled out form fields should be viewable/printable by anyone without first installing a font pack

The PDF document itself contains cyrillic text which is displayed just fine. Filling out the form in Acrobat Reader works flawlessly, the only problem is in PDFBox.

According to https://issues.apache.org/jira/browse/PDFBOX-3138 The embedded font used by the field does indeed contain Hebrew glyphs, and a valid "cmap" table which can be used to look up those glyphs. The mentioned character, U+05D7, is indeed is present in the font. The embedded font file is in OpenType format, however the PDF Font dictionary is Type1 and specifies WinAnsiEncoding, which does not include Hebrew characters. So, strictly speaking, the field cannot be filled using any non-ANSI characters and so PDFBox's behaviour is correct.

Tried another approach: Instead of setValue() I called ((PDTextField)field).setDefaultValue(); It does not throw an exception, but unfortunately in the result PDF I still see the previous default value in the document. The new default value only appears in the properties of the field.

Using this code I see that the font is a PDTrueTypeFont:
String  da      = field.getCOSObject().getString(COSName.DA.getName());
Matcher m       = Pattern.compile("/?(.*) [\\d]+ Tf.*", Pattern.CASE_INSENSITIVE).matcher(da);
String  name    = m.find() ? m.group(1) : null;
PDFont  font    = field.getAcroForm().getDefaultResources().getFont(COSName.getPDFName(name));

How can I create the PDF document and use PDFBox to fill out the form with non-ansi characters?

Thanks,
Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Writing non-ansi characters into form fields

Posted by Tilman Hausherr <TH...@t-online.de>.

Is there a font that contains only cyril glyphs? If you embed it, does 
it work with PDFBox?

Tilman

Am 22.03.2017 um 12:52 schrieb ropo@ropo.de:
> Hello,
>   
> I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:
>          doc = PDDocument.load(inputStream);
>          PDDocumentCatalog catalog = doc.getDocumentCatalog();
>          PDAcroForm form = catalog.getAcroForm();
>          for (PDField field : form.getFieldTree()){
>              field.setValue("\u0434\u043e\u043b\u0436\u0435\u043d");
>          }
>          
> I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic: TimesNewRomanPSMT) encoding: StandardEncoding with differences
>
> I can create the PDF documents any way I like. I have tried MS Office export as Adobe PDF and creating directly with Acrobat Pro DC. When creating the fields in Acrobat I can select a font. I tried all kinds of fonts, for "Arial Unicode MS" it wants to download a 50MB "Adobe Acrobat Reader DC Font Pack". The final PDF file with the filled out form fields should be viewable/printable by anyone without first installing a font pack
>
> The PDF document itself contains cyrillic text which is displayed just fine. Filling out the form in Acrobat Reader works flawlessly, the only problem is in PDFBox.
>
> According to https://issues.apache.org/jira/browse/PDFBOX-3138 The embedded font used by the field does indeed contain Hebrew glyphs, and a valid "cmap" table which can be used to look up those glyphs. The mentioned character, U+05D7, is indeed is present in the font. The embedded font file is in OpenType format, however the PDF Font dictionary is Type1 and specifies WinAnsiEncoding, which does not include Hebrew characters. So, strictly speaking, the field cannot be filled using any non-ANSI characters and so PDFBox's behaviour is correct.
>
> Tried another approach: Instead of setValue() I called ((PDTextField)field).setDefaultValue(); It does not throw an exception, but unfortunately in the result PDF I still see the previous default value in the document. The new default value only appears in the properties of the field.
>
> Using this code I see that the font is a PDTrueTypeFont:
> String  da      = field.getCOSObject().getString(COSName.DA.getName());
> Matcher m       = Pattern.compile("/?(.*) [\\d]+ Tf.*", Pattern.CASE_INSENSITIVE).matcher(da);
> String  name    = m.find() ? m.group(1) : null;
> PDFont  font    = field.getAcroForm().getDefaultResources().getFont(COSName.getPDFName(name));
>
> How can I create the PDF document and use PDFBox to fill out the form with non-ansi characters?
>
> Thanks,
> Roland
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org