You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/07/11 14:57:00 UTC

[jira] [Resolved] (PDFBOX-3864) UTF16 encoded string to PDFDocEncoding

     [ https://issues.apache.org/jira/browse/PDFBOX-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr resolved PDFBOX-3864.
-------------------------------------
    Resolution: Fixed

> UTF16 encoded string to PDFDocEncoding
> --------------------------------------
>
>                 Key: PDFBOX-3864
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3864
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.6
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>             Fix For: 2.0.7, 3.0.0
>
>
> From [~torakiki] in the mailing list:
> {quote}
> Hi, we came across this case where we are basically cloning outline items
> where the original outline title is a UTF16BE encoded text string
> containing the value 00A0 (non break space). We later use the string to
> assign the title in a new outline item and the A0 is recognised as a € sign.
> Here is a simple test:
> {code}
>         COSString victim = COSString
>                 .parseHex("FEFF004300680061007000740065007200A0");
>         PDOutlineItem node = new PDOutlineItem();
>         node.setTitle(victim.getString());
> {code}
> If you look at the node dictionary you'll see that the title value is
> Chapter€
> {quote}
> The cause is that in the initialization of PDFDocEncoding it was forgotten that there are "holes" in the 0..255 sequence. I'll add that and a test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org