You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (JIRA)" <ji...@apache.org> on 2014/10/16 20:38:35 UTC

[jira] [Commented] (PDFBOX-283) Character encoding/appearance issues when filling forms

    [ https://issues.apache.org/jira/browse/PDFBOX-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174068#comment-14174068 ] 

Maruan Sahyoun commented on PDFBOX-283:
---------------------------------------

There are still issues with the appearance generation but this is now being dealt with in PDFBOX-2333. Linking that and keeping it open to ensure we use the information provided as a testbed.

This also relates to PDFBOX-922 to resolve the encoding issue. So linking that too.

> Character encoding/appearance issues when filling forms
> -------------------------------------------------------
>
>                 Key: PDFBOX-283
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-283
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.0
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: PDAppearance.diff, PDAppearance.patch, PDAppearance_bis.diff, acroform.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1735902
> Originally submitted by scop on 2007-06-12 10:23.
> When filling a text field with non-ASCII characters such as in my surname "Skyttä" and saving the document in a UTF-8 environment, something goes wrong with the appearance of the text.
> The value itself seems to be stored correctly, but when opening the doc, the appearance of "ä" is not that, but rather something which happens when UTF-8 is mistakenly treated as ISO-8859-1 (two garbage characters).
> PDAppearance uses the platform default encoding in quite a few places which apparently has potential to mess things up.  In particular, insertGeneratedAppearance() generates a PrintWriter from an OutputStream without specifying the encoding.  In fact, if I hack that to use ISO-8859-1, the appearance of my "ä" case is correct, but that won't obviously work with anything else than chars that are valid ISO-8859-1.
> In which char encoding should the value be written to the appearance stream (at end of insertGeneratedAppearance())?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)