You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/12/14 20:03:00 UTC

[jira] Updated: (PDFBOX-751) Text Extraction truncates last character when image page has sideways text

     [ https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-751:
--------------------------------------

    Attachment: PDFBOX751-getimage1.txt

> Text Extraction truncates last character when image page has sideways text
> --------------------------------------------------------------------------
>
>                 Key: PDFBOX-751
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-751
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0
>         Environment: HP UX 11iV1
>            Reporter: Chris Chadwick
>         Attachments: getimage1.pdf, PDFBOX751-getimage1.txt
>
>
> When using unsorted text extraction on a PDF that contains a horizontal page (normal orienation text) followed by a page where all the text is rotated 90 degrees (landscape) , the last character of each word is forced onto a new line. For example
> Thi
> s
> erro
> r
> wa
> s
> logge
> d
> toda
> y
> It is only the last letter of each phrase that is affected, and it is only affected on the rotated page.
> Selecting the text from the image directly - in adobe do 'Select All', cut  - produces the correct results, as do other tools, so the text layer appears correct in the PDF file.
> Also please could you publish when V1.2 be ready as this may resolve this issue. Is it available as beta?
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.