You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Vitalie Bureanu (JIRA)" <ji...@apache.org> on 2014/01/22 12:02:19 UTC

[jira] [Updated] (PDFBOX-1858) Extracted text does not have spaces

     [ https://issues.apache.org/jira/browse/PDFBOX-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitalie Bureanu updated PDFBOX-1858:
------------------------------------

    Attachment:     (was: Untitled-1.jpg)

> Extracted text does not have spaces
> -----------------------------------
>
>                 Key: PDFBOX-1858
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1858
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 1.8.3
>         Environment: Linux 64bit, Java
>            Reporter: Vitalie Bureanu
>         Attachments: Screenshot.jpg, test.pdf
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Extracted text does not have spaces between some words.
> Use to test please a string on line 74a... inside of attached test.pdf.
> It will be extracted as: "74a Amount of line73youwant refunded toyou . If Form8888 isattached , checkhere"
> The result is not seems to be good, the words are "glued".
> I tried to use a class PDF Text Stripper but the resultstill remain the same.
> Can it be solved, please?
> With respect,
> Vitalie



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)