You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/06/17 22:55:07 UTC

[jira] [Commented] (PDFBOX-755) Wrong translation of capital letters with combining diacritics

    [ https://issues.apache.org/jira/browse/PDFBOX-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034361#comment-14034361 ] 

John Hewson commented on PDFBOX-755:
------------------------------------

Confirmed, here's what I get with PDFBox 2.0 trunk:
{code}
S. KALABUSˇIC´ ANDM. R. S. KULENOVIC´
{code}

and with Adobe Acrobat:
{code}
S. KALABUˇSI´C AND M. R. S. KULENOVI´C
{code}

> Wrong translation of capital letters with combining diacritics
> --------------------------------------------------------------
>
>                 Key: PDFBOX-755
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-755
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.0
>         Environment: Mac OS X 10.6.4
>            Reporter: Thomas Fischer
>         Attachments: 139-p.1+3.pdf, 139-p.1+3.txt
>
>
> S. KALABUˇSI ´C ANDM. R. S. KULENOVI ´C
> vs.
> S. KALABUŠIĆ AND M. R. S. KULENOVIĆ 
> 1.  ´ before vs.  ́ behind the letter (\x20 \xB4 vs. \x301)
> 2. ˇ before vs. ̌ behind the letter (\x27C vs. \x30C)
> 3. ANDM. : space missing
> Note:
> S. Kalabušić is translated correctly



--
This message was sent by Atlassian JIRA
(v6.2#6252)