You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Meier (JIRA)" <ji...@apache.org> on 2015/07/14 07:56:06 UTC

[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

    [ https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625868#comment-14625868 ] 

Andreas Meier commented on PDFBOX-2272:
---------------------------------------

I did some small changes to the PDFTextStripper.java (this is only a workaround, TextExtraction shall be rewritten for special cases and save lastPositions and other attributes in another heping object)
The direction of TextPositions will now be handled for TextPositions.
The testfile for PDFBOX-800 now extracts the rotated vertical text in the right way.

Since there is no evidence in the "problemdoc.pdf", that the vertical text without rotation belongs together, the result should be ok for now.
Can anybody confirm that?

> Can't extract vertical text correctly
> -------------------------------------
>
>                 Key: PDFBOX-2272
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2272
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: Biligsaikhan Batjargal
>         Attachments: test.pdf, test.txt
>
>
> - -1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 90ms-RKSJ-V.-
> - 2.0 extracts the text but can't handle the vertical layout
> Also see the file from PDFBOX-2294 which contains both horizontal and vertical text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org