You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Manfred Pock <po...@gmail.com> on 2015/04/22 13:32:09 UTC

€ String encoding

Hi,

add PDFBox 2.0 (trunk) i have the Problem that the € sign is not correct 
encoded. I have take a look at PDFDocEncoding and maybe there is an bug 
at the method getBytes(String text):

public static byte[] getBytes(String text)
     {
         ByteArrayOutputStream out = new ByteArrayOutputStream();
         for (char c : text.toCharArray())
         {
             Integer code = UNI_TO_CODE.get(c);
             if (code == null)
             {
                 out.write(0);
             }
             else
             {
out.write(c);
             }
         }
         return out.toByteArray();
     }

This method look at the UNI_TO_CODE-Map for the required character, but 
it does not write the founded encoded Integer to the outputstream, it 
writes the character?

Is that correkt?

regarts, Manfred

Re: € String encoding

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.

Hi,

> Am 22.04.2015 um 13:59 schrieb Maruan Sahyoun <sa...@fileaffairs.de>:
> 
> Hi,
> 
>> Am 22.04.2015 um 13:32 schrieb Manfred Pock <po...@gmail.com>:
>> 
>> Hi,
>> 
>> add PDFBox 2.0 (trunk) i have the Problem that the € sign is not correct encoded. I have take a look at PDFDocEncoding and maybe there is an bug at the method getBytes(String text):
>> 
> 
> there is already an issue for that - https://issues.apache.org/jira/browse/PDFBOX-2771 <https://issues.apache.org/jira/browse/PDFBOX-2771> 
> 
> 
>> public static byte[] getBytes(String text)
>>   {
>>       ByteArrayOutputStream out = new ByteArrayOutputStream();
>>       for (char c : text.toCharArray())
>>       {
>>           Integer code = UNI_TO_CODE.get(c);
>>           if (code == null)
>>           {
>>               out.write(0);
>>           }
>>           else
>>           {
>> out.write(c);
>>           }
>>       }
>>       return out.toByteArray();
>>   }
>> 
>> This method look at the UNI_TO_CODE-Map for the required character, but it does not write the founded encoded Integer to the outputstream, it writes the character?
>> 
>> Is that correkt?

No it wasn't and I've fixed it accordingly.

>> 
>> regarts, Manfred
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: € String encoding

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.

Hi,

> Am 22.04.2015 um 13:32 schrieb Manfred Pock <po...@gmail.com>:
> 
> Hi,
> 
> add PDFBox 2.0 (trunk) i have the Problem that the € sign is not correct encoded. I have take a look at PDFDocEncoding and maybe there is an bug at the method getBytes(String text):
> 

there is already an issue for that - https://issues.apache.org/jira/browse/PDFBOX-2771 <https://issues.apache.org/jira/browse/PDFBOX-2771> 


> public static byte[] getBytes(String text)
>    {
>        ByteArrayOutputStream out = new ByteArrayOutputStream();
>        for (char c : text.toCharArray())
>        {
>            Integer code = UNI_TO_CODE.get(c);
>            if (code == null)
>            {
>                out.write(0);
>            }
>            else
>            {
> out.write(c);
>            }
>        }
>        return out.toByteArray();
>    }
> 
> This method look at the UNI_TO_CODE-Map for the required character, but it does not write the founded encoded Integer to the outputstream, it writes the character?
> 
> Is that correkt?
> 
> regarts, Manfred