You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Chris Chadwick (JIRA)" <ji...@apache.org> on 2010/06/16 15:45:23 UTC
[jira] Created: (PDFBOX-751) Text Extraction truncates last
character when image page has sideways text
Text Extraction truncates last character when image page has sideways text
--------------------------------------------------------------------------
Key: PDFBOX-751
URL: https://issues.apache.org/jira/browse/PDFBOX-751
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.1.0
Environment: HP UX 11iV1
Reporter: Chris Chadwick
When using unsorted text extraction on a PDF that contains a horizontal page (normal orienation text) followed by a page where all the text is rotated 90 degrees (landscape) , the last character of each word is forced onto a new line. For example
Thi
s
erro
r
wa
s
logge
d
toda
y
It is only the last letter of each phrase that is affected, and it is only affected on the rotated page.
Selecting the text from the image directly - in adobe do 'Select All', cut - produces the correct results, as do other tools, so the text layer appears correct in the PDF file.
Also please could you publish when V1.2 be ready as this may resolve this issue. Is it available as beta?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-751) Text Extraction truncates last
character when image page has sideways text
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879673#action_12879673 ]
Andreas Lehmkühler commented on PDFBOX-751:
-------------------------------------------
Can you provide us with a sample pdf?
> Text Extraction truncates last character when image page has sideways text
> --------------------------------------------------------------------------
>
> Key: PDFBOX-751
> URL: https://issues.apache.org/jira/browse/PDFBOX-751
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.1.0
> Environment: HP UX 11iV1
> Reporter: Chris Chadwick
>
> When using unsorted text extraction on a PDF that contains a horizontal page (normal orienation text) followed by a page where all the text is rotated 90 degrees (landscape) , the last character of each word is forced onto a new line. For example
> Thi
> s
> erro
> r
> wa
> s
> logge
> d
> toda
> y
> It is only the last letter of each phrase that is affected, and it is only affected on the rotated page.
> Selecting the text from the image directly - in adobe do 'Select All', cut - produces the correct results, as do other tools, so the text layer appears correct in the PDF file.
> Also please could you publish when V1.2 be ready as this may resolve this issue. Is it available as beta?
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-751) Text Extraction truncates last
character when image page has sideways text
Posted by "Chris Chadwick (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879791#action_12879791 ]
Chris Chadwick commented on PDFBOX-751:
---------------------------------------
Hi, I have asked our customer whether we can include the image or not. In th meantime can you comment as to whether this issue has been seen before?
> Text Extraction truncates last character when image page has sideways text
> --------------------------------------------------------------------------
>
> Key: PDFBOX-751
> URL: https://issues.apache.org/jira/browse/PDFBOX-751
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.1.0
> Environment: HP UX 11iV1
> Reporter: Chris Chadwick
>
> When using unsorted text extraction on a PDF that contains a horizontal page (normal orienation text) followed by a page where all the text is rotated 90 degrees (landscape) , the last character of each word is forced onto a new line. For example
> Thi
> s
> erro
> r
> wa
> s
> logge
> d
> toda
> y
> It is only the last letter of each phrase that is affected, and it is only affected on the rotated page.
> Selecting the text from the image directly - in adobe do 'Select All', cut - produces the correct results, as do other tools, so the text layer appears correct in the PDF file.
> Also please could you publish when V1.2 be ready as this may resolve this issue. Is it available as beta?
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.