You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ashok Chigullapally (JIRA)" <ji...@apache.org> on 2011/02/08 01:54:57 UTC

[jira] Updated: (PDFBOX-957) Text extraction using ExtractText (pdf file is input file) generates some weired characters

     [ https://issues.apache.org/jira/browse/PDFBOX-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashok Chigullapally updated PDFBOX-957:
---------------------------------------

    Attachment: Resume1.pdf

Resume file as pdf which cannot be extracted.

> Text extraction using ExtractText (pdf file is input file) generates some weired characters
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-957
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-957
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>         Environment: Windows 7
>            Reporter: Ashok Chigullapally
>            Priority: Critical
>              Labels: pdfbox, textExtraction
>         Attachments: Resume1.pdf, Resume2.pdf
>
>
> When I tried to extract text from pdf document it is generating some gibberish text. 
> ExtractText.exe "\Jobvite\Resumes\Resume-Boston.pdf Resume-Boston.txt
> Will provide the pdf documents when requested, I could not find a way to include attachments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira