You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Scott Tomer <sc...@tomer.cc> on 2018/03/08 19:42:06 UTC

International characters only show correctly when form field is selected

All,

I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> before asking with no luck.

We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz

However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź

It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.

This is happening with Polish, Russian, Chinese and other languages.

This is how I am populating the fields:

PDDocument pdfDoc = LoadPDF.load(cs, document);
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();

if (acroForm != null) {
	for (PDField field : acroForm.getFieldTree()) {
		for (PdfField pdfField : pdfFields) {
			if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
				field.setValue(pdfField.getValue());
			}
		}
	}
}
pdfDoc.save(tempPdf);
pdfDoc.close();


Thanks for any help,
Scott

Re: International characters only show correctly when form field is selected

Posted by Alessandro Bellini <a....@gmail.com>.
Oh0u8e

Il 08/Mar/2018 20:42, "Scott Tomer" <sc...@tomer.cc> ha scritto:

> All,
>
> I’m new to the list, but tried to search pdfbox-users.markmail.org <
> http://pdfbox-users.markmail.org/> before asking with no luck.
>
> We are using pdfbox to fill in some form fields in an Adobe generated
> template but getting odd results when certain international characters are
> used (some, not all).  When the pdf is first opened, the characters shown
> are basically garbage.  Here is an example: þÿB D Aóz
>
> However, when you click into the field (or in certain readers like Okular
> on Linux choose “Show Forms”), the correct characters are shown.  Here is
> what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>
> It is almost like the PDF has one font selected for a read-only view and
> the correct font for the view when editing a field.
>
> This is happening with Polish, Russian, Chinese and other languages.
>
> This is how I am populating the fields:
>
> PDDocument pdfDoc = LoadPDF.load(cs, document);
> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
> PDAcroForm acroForm = docCatalog.getAcroForm();
>
> if (acroForm != null) {
>         for (PDField field : acroForm.getFieldTree()) {
>                 for (PdfField pdfField : pdfFields) {
>                         if (field.getPartialName() != null &&
> field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>                                 field.setValue(pdfField.getValue());
>                         }
>                 }
>         }
> }
> pdfDoc.save(tempPdf);
> pdfDoc.close();
>
>
> Thanks for any help,
> Scott

Re: International characters only show correctly when form field is selected

Posted by Scott Tomer <sc...@tomer.cc>.
Also, as an FYI, when I pull this up in the debugger, I get the following in my command line:
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (1) in font Times-Roman
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.rendering.Type1Glyph2D getPathForCharacterCode
WARNING: No glyph for 1 (.notdef) in font Times-Roman
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (0) in font Times-Roman
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.rendering.Type1Glyph2D getPathForCharacterCode
WARNING: No glyph for 0 (.notdef) in font Times-Roman
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (1) in font Times-Roman
Mar 09, 2018 10:40:45 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (0) in font Times-Roman

And, just like with other PDF Readers, the text shown in the debugger right hand pane is garbled while the text shown in the tree view of the field shows the correct text as its value.

Thanks

> On Mar 9, 2018, at 9:57 AM, Scott Tomer <sc...@tomer.cc> wrote:
> 
> It is generated at a customer.
> 
> Here is an image from the debugger:
> <PastedGraphic-1.png>
> 
> Thanks
> 
>> On Mar 8, 2018, at 8:47 PM, Tilman Hausherr <THausherr@t-online.de <ma...@t-online.de>> wrote:
>> 
>> Is the form generated in your company? If yes, can you generate an empty form with just one field?
>> 
>> If not, please open the file with PDFDebugger and click "show internal structure" and then show the appearance stream (..../AP/N ) and the default appearance (..../DA) of the field. Also the details of Acroform/DR as seen here. The image should either be inline in the mail or uploaded to a sharehoster (e.g. dropbox, google drive, etc). Blacken anything that is confidential.
>> 
>> <ckdfggmbpgkfoemj.png>
>> 
>> Tilman
>> 
>> Am 09.03.2018 um 03:05 schrieb Scott Tomer:
>>> I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.
>>> 
>>> Thanks
>>> 
>>>> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun <sa...@fileaffairs.de> <ma...@fileaffairs.de> wrote:
>>>> 
>>>> Hi Scott,
>>>> 
>>>>> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc> <ma...@tomer.cc>:
>>>>> 
>>>>> All,
>>>>> 
>>>>> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> <http://pdfbox-users.markmail.org/> <http://pdfbox-users.markmail.org/> before asking with no luck.
>>>>> 
>>>>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>>>>> 
>>>>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>>>>> 
>>>>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>>>>> 
>>>>> This is happening with Polish, Russian, Chinese and other languages.
>>>>> 
>>>>> This is how I am populating the fields:
>>>>> 
>>>>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>>>>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>>>>> PDAcroForm acroForm = docCatalog.getAcroForm();
>>>>> 
>>>>> if (acroForm != null) {
>>>>>   for (PDField field : acroForm.getFieldTree()) {
>>>>>       for (PdfField pdfField : pdfFields) {
>>>>>           if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>>>>               field.setValue(pdfField.getValue());
>>>>>           }
>>>>>       }
>>>>>   }
>>>>> }
>>>>> pdfDoc.save(tempPdf);
>>>>> pdfDoc.close();
>>>>> 
>>>> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
>>>> 
>>>> BR
>>>> Maruan 
>>>> 
>>>>> Thanks for any help,
>>>>> Scott
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
>> 
> 


Re: International characters only show correctly when form field is selected

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.03.2018 um 19:20 schrieb Tilman Hausherr:
>
>
> What I could do is to search (with code) for a file that is similar to 
> yours, i.e. acroform, a field with type 1 font and DictionaryEncoding. 
> This may take some time, maybe next week. 


Issue created:

https://issues.apache.org/jira/browse/PDFBOX-4152



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: International characters only show correctly when form field is selected

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.03.2018 um 19:40 schrieb Scott Tomer:
> Thanks, again, I really appreciate your help.
>
> Also, just an FYI, I found where my code is giving the following exception when trying to use those characters:
> Exception: U+0144 ('nacute') is not available in this font Times-Roman (generic: NimbusRomNo9L-Regu) encoding: StandardEncoding with differences

That is weird - so you do get an exception, but your file is generated 
anyway?

Another idea besides the SO solution - can you install a Times Roman / 
Times New Roman font on your system?

Tilman


>
> Thanks
>
>> On Mar 9, 2018, at 12:20 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> Ok, now I don't have further questions, but I am still wondering what is going on :-(
>>
>> What I could do is to search (with code) for a file that is similar to yours, i.e. acroform, a field with type 1 font and DictionaryEncoding. This may take some time, maybe next week.
>>
>> In the meantime, what you could do is to replace the font, like done in this SO answer:
>> https://stackoverflow.com/questions/47995062/pdfbox-api-how-to-handle-cyrillic-values/47997118#47997118
>>
>> instead of Arial just find a Times Roman font on your system.
>>
>> Tilman
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: International characters only show correctly when form field is selected

Posted by Scott Tomer <sc...@tomer.cc>.
Thanks, again, I really appreciate your help.

Also, just an FYI, I found where my code is giving the following exception when trying to use those characters:
Exception: U+0144 ('nacute') is not available in this font Times-Roman (generic: NimbusRomNo9L-Regu) encoding: StandardEncoding with differences

Thanks

> On Mar 9, 2018, at 12:20 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> Ok, now I don't have further questions, but I am still wondering what is going on :-(
> 
> What I could do is to search (with code) for a file that is similar to yours, i.e. acroform, a field with type 1 font and DictionaryEncoding. This may take some time, maybe next week.
> 
> In the meantime, what you could do is to replace the font, like done in this SO answer:
> https://stackoverflow.com/questions/47995062/pdfbox-api-how-to-handle-cyrillic-values/47997118#47997118
> 
> instead of Arial just find a Times Roman font on your system.
> 
> Tilman
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: International characters only show correctly when form field is selected

Posted by Tilman Hausherr <TH...@t-online.de>.
Ok, now I don't have further questions, but I am still wondering what is 
going on :-(

What I could do is to search (with code) for a file that is similar to 
yours, i.e. acroform, a field with type 1 font and DictionaryEncoding. 
This may take some time, maybe next week.

In the meantime, what you could do is to replace the font, like done in 
this SO answer:
https://stackoverflow.com/questions/47995062/pdfbox-api-how-to-handle-cyrillic-values/47997118#47997118

instead of Arial just find a Times Roman font on your system.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: International characters only show correctly when form field is selected

Posted by Scott Tomer <sc...@tomer.cc>.
Certainly, I appreciate the help.  It appears to be DictionaryEncoding:





Thanks,
Scott


> On Mar 9, 2018, at 10:46 AM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> Hi,
> 
> Could you open the /DR part completely, both the one here and in Acroform/DR ? And the "N" part too, if there are any resources with fonts. This is to see how "TiRo" is configured. Normally it should bring an exception, like this when trying to assign "Stanisław" in the CreateSimpleForm example:
> 
> Exception in thread "main" java.lang.IllegalArgumentException: U+0142 ('lslash') is not available in this font Helvetica encoding: WinAnsiEncoding
> 
> What I can see from your image is that your font uses 2 bytes per character, so likely it isn't WinAnsiEncoding, or something else went wrong.
> 
> 
> <ipojkljkjbmhhdbj.png>
> 
> Tilman
> 
> Am 09.03.2018 um 16:57 schrieb Scott Tomer:
>> It is generated at a customer.
>> 
>> Here is an image from the debugger:
>> <PastedGraphic-1.png>
>> 
>> Thanks
>> 
>>> On Mar 8, 2018, at 8:47 PM, Tilman Hausherr <THausherr@t-online.de <ma...@t-online.de>> wrote:
>>> 
>>> Is the form generated in your company? If yes, can you generate an empty form with just one field?
>>> 
>>> If not, please open the file with PDFDebugger and click "show internal structure" and then show the appearance stream (..../AP/N ) and the default appearance (..../DA) of the field. Also the details of Acroform/DR as seen here. The image should either be inline in the mail or uploaded to a sharehoster (e.g. dropbox, google drive, etc). Blacken anything that is confidential.
>>> 
>>> <ckdfggmbpgkfoemj.png>
>>> 
>>> Tilman
>>> 
>>> Am 09.03.2018 um 03:05 schrieb Scott Tomer:
>>>> I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.
>>>> 
>>>> Thanks
>>>> 
>>>>> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun <sa...@fileaffairs.de> <ma...@fileaffairs.de> wrote:
>>>>> 
>>>>> Hi Scott,
>>>>> 
>>>>>> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc> <ma...@tomer.cc>:
>>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> <http://pdfbox-users.markmail.org/> <http://pdfbox-users.markmail.org/> before asking with no luck.
>>>>>> 
>>>>>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>>>>>> 
>>>>>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>>>>>> 
>>>>>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>>>>>> 
>>>>>> This is happening with Polish, Russian, Chinese and other languages.
>>>>>> 
>>>>>> This is how I am populating the fields:
>>>>>> 
>>>>>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>>>>>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>>>>>> PDAcroForm acroForm = docCatalog.getAcroForm();
>>>>>> 
>>>>>> if (acroForm != null) {
>>>>>>   for (PDField field : acroForm.getFieldTree()) {
>>>>>>       for (PdfField pdfField : pdfFields) {
>>>>>>           if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>>>>>               field.setValue(pdfField.getValue());
>>>>>>           }
>>>>>>       }
>>>>>>   }
>>>>>> }
>>>>>> pdfDoc.save(tempPdf);
>>>>>> pdfDoc.close();
>>>>>> 
>>>>> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
>>>>> 
>>>>> BR
>>>>> Maruan 
>>>>> 
>>>>>> Thanks for any help,
>>>>>> Scott
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
>>> 
>> 
> 


Re: International characters only show correctly when form field is selected

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

Could you open the /DR part completely, both the one here and in 
Acroform/DR ? And the "N" part too, if there are any resources with 
fonts. This is to see how "TiRo" is configured. Normally it should bring 
an exception, like this when trying to assign "Stanisław" in the 
CreateSimpleForm example:

Exception in thread "main" java.lang.IllegalArgumentException: U+0142 
('lslash') is not available in this font Helvetica encoding: WinAnsiEncoding

What I can see from your image is that your font uses 2 bytes per 
character, so likely it isn't WinAnsiEncoding, or something else went wrong.




Tilman

Am 09.03.2018 um 16:57 schrieb Scott Tomer:
> It is generated at a customer.
>
> Here is an image from the debugger:
>
> Thanks
>
>> On Mar 8, 2018, at 8:47 PM, Tilman Hausherr <THausherr@t-online.de 
>> <ma...@t-online.de>> wrote:
>>
>> Is the form generated in your company? If yes, can you generate an 
>> empty form with just one field?
>>
>> If not, please open the file with PDFDebugger and click "show 
>> internal structure" and then show the appearance stream (..../AP/N ) 
>> and the default appearance (..../DA) of the field. Also the details 
>> of Acroform/DR as seen here. The image should either be inline in the 
>> mail or uploaded to a sharehoster (e.g. dropbox, google drive, etc). 
>> Blacken anything that is confidential.
>>
>> <ckdfggmbpgkfoemj.png>
>>
>> Tilman
>>
>> Am 09.03.2018 um 03:05 schrieb Scott Tomer:
>>> I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.
>>>
>>> Thanks
>>>
>>>> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun<sa...@fileaffairs.de>  wrote:
>>>>
>>>> Hi Scott,
>>>>
>>>>> Am 08.03.2018 um 20:42 schrieb Scott Tomer<sc...@tomer.cc>:
>>>>>
>>>>> All,
>>>>>
>>>>> I’m new to the list, but tried to searchpdfbox-users.markmail.org <http://pdfbox-users.markmail.org>  <http://pdfbox-users.markmail.org/>  before asking with no luck.
>>>>>
>>>>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>>>>>
>>>>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>>>>>
>>>>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>>>>>
>>>>> This is happening with Polish, Russian, Chinese and other languages.
>>>>>
>>>>> This is how I am populating the fields:
>>>>>
>>>>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>>>>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>>>>> PDAcroForm acroForm = docCatalog.getAcroForm();
>>>>>
>>>>> if (acroForm != null) {
>>>>>    for (PDField field : acroForm.getFieldTree()) {
>>>>>        for (PdfField pdfField : pdfFields) {
>>>>>            if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>>>>                field.setValue(pdfField.getValue());
>>>>>            }
>>>>>        }
>>>>>    }
>>>>> }
>>>>> pdfDoc.save(tempPdf);
>>>>> pdfDoc.close();
>>>>>
>>>> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
>>>>
>>>> BR
>>>> Maruan
>>>>
>>>>> Thanks for any help,
>>>>> Scott
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:users-unsubscribe@pdfbox.apache.org  <ma...@pdfbox.apache.org>
>>>> For additional commands, e-mail:users-help@pdfbox.apache.org  <ma...@pdfbox.apache.org>
>>
>>
>


Re: International characters only show correctly when form field is selected

Posted by Scott Tomer <sc...@tomer.cc>.
It is generated at a customer.

Here is an image from the debugger:


Thanks

> On Mar 8, 2018, at 8:47 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> Is the form generated in your company? If yes, can you generate an empty form with just one field?
> 
> If not, please open the file with PDFDebugger and click "show internal structure" and then show the appearance stream (..../AP/N ) and the default appearance (..../DA) of the field. Also the details of Acroform/DR as seen here. The image should either be inline in the mail or uploaded to a sharehoster (e.g. dropbox, google drive, etc). Blacken anything that is confidential.
> 
> <ckdfggmbpgkfoemj.png>
> 
> Tilman
> 
> Am 09.03.2018 um 03:05 schrieb Scott Tomer:
>> I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.
>> 
>> Thanks
>> 
>>> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun <sa...@fileaffairs.de> <ma...@fileaffairs.de> wrote:
>>> 
>>> Hi Scott,
>>> 
>>>> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc> <ma...@tomer.cc>:
>>>> 
>>>> All,
>>>> 
>>>> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> <http://pdfbox-users.markmail.org/> before asking with no luck.
>>>> 
>>>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>>>> 
>>>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>>>> 
>>>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>>>> 
>>>> This is happening with Polish, Russian, Chinese and other languages.
>>>> 
>>>> This is how I am populating the fields:
>>>> 
>>>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>>>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>>>> PDAcroForm acroForm = docCatalog.getAcroForm();
>>>> 
>>>> if (acroForm != null) {
>>>>   for (PDField field : acroForm.getFieldTree()) {
>>>>       for (PdfField pdfField : pdfFields) {
>>>>           if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>>>               field.setValue(pdfField.getValue());
>>>>           }
>>>>       }
>>>>   }
>>>> }
>>>> pdfDoc.save(tempPdf);
>>>> pdfDoc.close();
>>>> 
>>> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
>>> 
>>> BR
>>> Maruan 
>>> 
>>>> Thanks for any help,
>>>> Scott
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: users-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org> <ma...@pdfbox.apache.org>
> 


Re: International characters only show correctly when form field is selected

Posted by Tilman Hausherr <TH...@t-online.de>.
Is the form generated in your company? If yes, can you generate an empty 
form with just one field?

If not, please open the file with PDFDebugger and click "show internal 
structure" and then show the appearance stream (..../AP/N ) and the 
default appearance (..../DA) of the field. Also the details of 
Acroform/DR as seen here. The image should either be inline in the mail 
or uploaded to a sharehoster (e.g. dropbox, google drive, etc). Blacken 
anything that is confidential.



Tilman

Am 09.03.2018 um 03:05 schrieb Scott Tomer:
> I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.
>
> Thanks
>
>> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>
>> Hi Scott,
>>
>>> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc>:
>>>
>>> All,
>>>
>>> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> before asking with no luck.
>>>
>>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>>>
>>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>>>
>>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>>>
>>> This is happening with Polish, Russian, Chinese and other languages.
>>>
>>> This is how I am populating the fields:
>>>
>>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>>> PDAcroForm acroForm = docCatalog.getAcroForm();
>>>
>>> if (acroForm != null) {
>>>    for (PDField field : acroForm.getFieldTree()) {
>>>        for (PdfField pdfField : pdfFields) {
>>>            if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>>                field.setValue(pdfField.getValue());
>>>            }
>>>        }
>>>    }
>>> }
>>> pdfDoc.save(tempPdf);
>>> pdfDoc.close();
>>>
>> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
>>
>> BR
>> Maruan
>>
>>> Thanks for any help,
>>> Scott
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
>> For additional commands, e-mail: users-help@pdfbox.apache.org <ma...@pdfbox.apache.org>



Re: International characters only show correctly when form field is selected

Posted by Scott Tomer <sc...@tomer.cc>.
I’m using 2.0.8.  I do not have permission, nor a public site to upload the PDF to, sorry.

Thanks

> On Mar 8, 2018, at 4:01 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> 
> Hi Scott,
> 
>> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc>:
>> 
>> All,
>> 
>> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> before asking with no luck.
>> 
>> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
>> 
>> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
>> 
>> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
>> 
>> This is happening with Polish, Russian, Chinese and other languages.
>> 
>> This is how I am populating the fields:
>> 
>> PDDocument pdfDoc = LoadPDF.load(cs, document);
>> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
>> PDAcroForm acroForm = docCatalog.getAcroForm();
>> 
>> if (acroForm != null) {
>>   for (PDField field : acroForm.getFieldTree()) {
>>       for (PdfField pdfField : pdfFields) {
>>           if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>>               field.setValue(pdfField.getValue());
>>           }
>>       }
>>   }
>> }
>> pdfDoc.save(tempPdf);
>> pdfDoc.close();
>> 
> 
> Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?
> 
> BR
> Maruan 
> 
>> 
>> Thanks for any help,
>> Scott
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
> For additional commands, e-mail: users-help@pdfbox.apache.org <ma...@pdfbox.apache.org>

Re: International characters only show correctly when form field is selected

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Scott,

> Am 08.03.2018 um 20:42 schrieb Scott Tomer <sc...@tomer.cc>:
> 
> All,
> 
> I’m new to the list, but tried to search pdfbox-users.markmail.org <http://pdfbox-users.markmail.org/> before asking with no luck.
> 
> We are using pdfbox to fill in some form fields in an Adobe generated template but getting odd results when certain international characters are used (some, not all).  When the pdf is first opened, the characters shown are basically garbage.  Here is an example: þÿB D Aóz
> 
> However, when you click into the field (or in certain readers like Okular on Linux choose “Show Forms”), the correct characters are shown.  Here is what is inserted into the field and shown when field is selected: ł ń Ł ó ź
> 
> It is almost like the PDF has one font selected for a read-only view and the correct font for the view when editing a field.
> 
> This is happening with Polish, Russian, Chinese and other languages.
> 
> This is how I am populating the fields:
> 
> PDDocument pdfDoc = LoadPDF.load(cs, document);
> PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
> PDAcroForm acroForm = docCatalog.getAcroForm();
> 
> if (acroForm != null) {
>    for (PDField field : acroForm.getFieldTree()) {
>        for (PdfField pdfField : pdfFields) {
>            if (field.getPartialName() != null && field.getPartialName().equalsIgnoreCase(pdfField.getName())) {
>                field.setValue(pdfField.getValue());
>            }
>        }
>    }
> }
> pdfDoc.save(tempPdf);
> pdfDoc.close();
> 

Looks like there is an issue with the appearance generation. Which PDFBox version are you using? Could you upload a sample PDF to a public location for further investigstion?

BR
Maruan 

> 
> Thanks for any help,
> Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org