You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/12/12 05:50:13 UTC

[jira] [Resolved] (PDFBOX-1242) Handle non ISO-8859-1 chars with drawString

     [ https://issues.apache.org/jira/browse/PDFBOX-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Hewson resolved PDFBOX-1242.
---------------------------------
    Resolution: Fixed

This rather large commit removes the usage of COSString when handling content streams. It also overhauls the internals of COSString, restricting where and when encoding or decoding occurs. There are actually three types of COSString in the PDF spec "text strings" "ascii strings" and "byte strings", all of which COSString now handles. The previous code assumed that all strings were "text strings" which have PDFDocEncoding or UTF-16BE, but this is not the case.

However, the modified drawString method still uses ISO-8859-1 and needs to be changed to use the font's encoding. For simple encodings this is being addressed in PDFBOX-922, and for full Unicode in PDFBOX-2524. I'm going to resolve this issue as "Fixed", as there was work done here, and the work will be done in those other two issues.

> Handle non ISO-8859-1 chars with drawString
> -------------------------------------------
>
>                 Key: PDFBOX-1242
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1242
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Peter Andersen
>            Assignee: John Hewson
>             Fix For: 2.0.0
>
>
> The PDPageContentStream.drawString take a String as argument, it construct a COSString of the input.
> If the input contain chars above 255, the COSString is prefixed 0xFe, 0xff and the bytes are taken from the
> input as "UTF-16BE" encoded.
> Back in the drawString method this unicode16 encoded COSString is appended as a "ISO-8859-1"        
> 	appendRawCommands( new String( buffer.toByteArray(), "ISO-8859-1"));
>  
> The result of this is that a line with UTF-16 chars is shown prefix with þÿ, and with double space between the other chars.
> The chars above 255 are shown as the two corresponding ISO-8859-1 characters.
> As a side question to this observation, is there an alternative way to use Pdfbox, to support UTF16?
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)