You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2016/10/18 06:51:58 UTC
[jira] [Updated] (PDFBOX-2425) Extracted text has extra spaces
[ https://issues.apache.org/jira/browse/PDFBOX-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson updated PDFBOX-2425:
--------------------------------
Summary: Extracted text has extra spaces (was: Extracted OCR text has extra spaces)
> Extracted text has extra spaces
> -------------------------------
>
> Key: PDFBOX-2425
> URL: https://issues.apache.org/jira/browse/PDFBOX-2425
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.7, 1.8.10, 1.8.11, 2.0.0
> Reporter: John Hewson
> Attachments: WooLam93c-Visible-p1.pdf, WooLam93c.pdf
>
>
> This is a very old issue, originally from PDFBOX-37. The attached file has extra spaces inserted in the title text by PDFTextStripper.
> {code}
> A Framework for D i s t r i bu t ed Au thor i z a t i on*
> (Extended Abstract)
> Thoma s Y .C . Woo S imon S. L am
> Depa r tmen t of Compu t e r Sc i ences
> Th e Un i v e r s i t y of T ex a s a t Au s t i n
> Au s t i n , T exa s 78712-1188
> 1 In t r oduc t i on
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org