You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/10/21 12:05:59 UTC

[jira] Updated: (PDFBOX-61) Spaces in extracted file

     [ https://issues.apache.org/jira/browse/PDFBOX-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated PDFBOX-61:
--------------------------------

         Priority: Blocker
         Reporter: Jukka Zitting
    Fix Version/s: 0.8.0-incubator

> Spaces in extracted file
> ------------------------
>
>                 Key: PDFBOX-61
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-61
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1208824
> Originally submitted by nobody on 2005-05-25 16:40.
> In trying to integrate with lucene, I was having 
> problems.  The Lucene people suggested that I check 
> the output of extract utility against one of my test pdf's.  
> When I did, I saw spaces placed inside many of the 
> words.  I was on version 0.7.0.  So I downloaded 0.7.1 
> and see the same results.
> One of the test files where I see this issue is attached.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1208824&file_id=135995
> Tom_3.pdf (application/pdf), 10145 bytes
> Test pdf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.