You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/10/01 08:40:34 UTC

[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

    [ https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154453#comment-14154453 ] 

John Hewson commented on PDFBOX-2272:
-------------------------------------

1.8 can extract Unicode text in general but fails for this particular font. The 2.0 trunk can successfully extract the text for this font. Neither version can handle the vertical layout correctly, so the text comes out in the wrong order. 

> Can't extract vertical text correctly
> -------------------------------------
>
>                 Key: PDFBOX-2272
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2272
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: Biligsaikhan Batjargal
>         Attachments: test.pdf, test.txt
>
>
> - 1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 90ms-RKSJ-V.
> - 2.0 extracts the text but can't handle the vertical layout
> Also see the file from PDFBOX-2294 which contains both horizontal and vertical text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)