You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Pär Wenåker (JIRA)" <ji...@apache.org> on 2011/01/04 11:28:49 UTC

[jira] Created: (PDFBOX-932) Swedish characters are garbled in form

Swedish characters are garbled in form
--------------------------------------

                 Key: PDFBOX-932
                 URL: https://issues.apache.org/jira/browse/PDFBOX-932
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel.AcroForm
    Affects Versions: 1.4.0
         Environment: Mac OSX, Java6
            Reporter: Pär Wenåker


When using swedish characters to fill in a form they show up garbled in the PDF. This seems to have to do with the PDAppearance class. When calling setValue on the field, the value seems to be set ok since COSString handles characters outside ASCII in its writePDF method. When PDAppearance writes the value in insertGeneratedAppearance it does not do the same check. If the same check is done it seems to work for PDAppearance to (see patch below). Since I do not know very much about the PDF format, I dont know if this is the right way to do it...

        PDDocument document = PDDocument.load(<pdf-file>);
        PDDocumentCatalog docCatalog = document.getDocumentCatalog();
        PDAcroForm form = docCatalog.getAcroForm();
        PDField field = form.getField(<field name>);
        field.setValue("åäö");


@@ -400,9 +401,32 @@
         {
             throw new IOException( "Error: Unknown justification value:" + q );
         }
-        printWriter.println("(" + value + ") Tj");
-        printWriter.println("ET" );
-        printWriter.flush();
+        boolean outsideASCII = false;
+        byte[] bytes = value.getBytes("ISO-8859-1");
+        int length = bytes.length; 
+        
+        for( int i=0; i<length && !outsideASCII; i++ )
+        {
+            //if the byte is negative then it is an eight bit byte and is
+            //outside the ASCII range.
+            outsideASCII = bytes[i] <0;
+        }
+        if(!outsideASCII) {
+            printWriter.println("(" + value + ") Tj");
+            printWriter.println("ET" );
+            printWriter.flush();            
+        } else {
+            printWriter.print("<");
+            for(int i=0; i<length; i++ )
+            {
+                String val = COSHEXTable.HEX_TABLE[ (bytes[i]+256)%256 ];           
+                printWriter.write(val);
+            }
+            printWriter.println("> Tj");
+            printWriter.println("ET" );
+            printWriter.flush();            
+        }
     }
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PDFBOX-932) Swedish characters are garbled in form

Posted by "Jens Kleemann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183981#comment-13183981 ] 

Jens Kleemann commented on PDFBOX-932:
--------------------------------------

i agree on the submitter. This is a major drawback of the pdfbox formfill mechanism. I applied the patch to 1.7.0-SNAPSHOT and it works for german umlaute also. 
                
> Swedish characters are garbled in form
> --------------------------------------
>
>                 Key: PDFBOX-932
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-932
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel.AcroForm
>    Affects Versions: 1.4.0
>         Environment: Mac OSX, Java6
>            Reporter: Pär Wenåker
>
> When using swedish characters to fill in a form they show up garbled in the PDF. This seems to have to do with the PDAppearance class. When calling setValue on the field, the value seems to be set ok since COSString handles characters outside ASCII in its writePDF method. When PDAppearance writes the value in insertGeneratedAppearance it does not do the same check. If the same check is done it seems to work for PDAppearance to (see patch below). Since I do not know very much about the PDF format, I dont know if this is the right way to do it...
>         PDDocument document = PDDocument.load(<pdf-file>);
>         PDDocumentCatalog docCatalog = document.getDocumentCatalog();
>         PDAcroForm form = docCatalog.getAcroForm();
>         PDField field = form.getField(<field name>);
>         field.setValue("åäö");
> @@ -400,9 +401,32 @@
>          {
>              throw new IOException( "Error: Unknown justification value:" + q );
>          }
> -        printWriter.println("(" + value + ") Tj");
> -        printWriter.println("ET" );
> -        printWriter.flush();
> +        boolean outsideASCII = false;
> +        byte[] bytes = value.getBytes("ISO-8859-1");
> +        int length = bytes.length; 
> +        
> +        for( int i=0; i<length && !outsideASCII; i++ )
> +        {
> +            //if the byte is negative then it is an eight bit byte and is
> +            //outside the ASCII range.
> +            outsideASCII = bytes[i] <0;
> +        }
> +        if(!outsideASCII) {
> +            printWriter.println("(" + value + ") Tj");
> +            printWriter.println("ET" );
> +            printWriter.flush();            
> +        } else {
> +            printWriter.print("<");
> +            for(int i=0; i<length; i++ )
> +            {
> +                String val = COSHEXTable.HEX_TABLE[ (bytes[i]+256)%256 ];           
> +                printWriter.write(val);
> +            }
> +            printWriter.println("> Tj");
> +            printWriter.println("ET" );
> +            printWriter.flush();            
> +        }
>      }
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira