You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Vitalie Bureanu (JIRA)" <ji...@apache.org> on 2014/01/22 12:02:19 UTC
[jira] [Updated] (PDFBOX-1858) Extracted text does not have spaces
[ https://issues.apache.org/jira/browse/PDFBOX-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalie Bureanu updated PDFBOX-1858:
------------------------------------
Attachment: (was: Untitled-1.jpg)
> Extracted text does not have spaces
> -----------------------------------
>
> Key: PDFBOX-1858
> URL: https://issues.apache.org/jira/browse/PDFBOX-1858
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, Text extraction
> Affects Versions: 1.8.3
> Environment: Linux 64bit, Java
> Reporter: Vitalie Bureanu
> Attachments: Screenshot.jpg, test.pdf
>
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> Extracted text does not have spaces between some words.
> Use to test please a string on line 74a... inside of attached test.pdf.
> It will be extracted as: "74a Amount of line73youwant refunded toyou . If Form8888 isattached , checkhere"
> The result is not seems to be good, the words are "glued".
> I tried to use a class PDF Text Stripper but the resultstill remain the same.
> Can it be solved, please?
> With respect,
> Vitalie
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)