You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/12/14 20:03:00 UTC
[jira] Updated: (PDFBOX-751) Text Extraction truncates last
character when image page has sideways text
[ https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-751:
--------------------------------------
Attachment: PDFBOX751-getimage1.txt
> Text Extraction truncates last character when image page has sideways text
> --------------------------------------------------------------------------
>
> Key: PDFBOX-751
> URL: https://issues.apache.org/jira/browse/PDFBOX-751
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.1.0
> Environment: HP UX 11iV1
> Reporter: Chris Chadwick
> Attachments: getimage1.pdf, PDFBOX751-getimage1.txt
>
>
> When using unsorted text extraction on a PDF that contains a horizontal page (normal orienation text) followed by a page where all the text is rotated 90 degrees (landscape) , the last character of each word is forced onto a new line. For example
> Thi
> s
> erro
> r
> wa
> s
> logge
> d
> toda
> y
> It is only the last letter of each phrase that is affected, and it is only affected on the rotated page.
> Selecting the text from the image directly - in adobe do 'Select All', cut - produces the correct results, as do other tools, so the text layer appears correct in the PDF file.
> Also please could you publish when V1.2 be ready as this may resolve this issue. Is it available as beta?
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.