You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Chris Bamford (JIRA)" <ji...@apache.org> on 2014/01/02 18:10:08 UTC
[jira] [Commented] (PDFBOX-1502) Not Extracting Text from PDF
Document
[ https://issues.apache.org/jira/browse/PDFBOX-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860341#comment-13860341 ]
Chris Bamford commented on PDFBOX-1502:
---------------------------------------
Hi Andreas
I'm puzzled as to why this issue was closed. It says you were unable to reproduce the fault Deepak described - does this mean that you were able to extract the tokens "C23445", "Mimecast", "Fred" and "Box" from the *edited* version of the PDF (Renewal_Advice_Edited.pdf)?
If so, that's fantastic - please advise how you did it as we have no luck (our current version of PDFBox is 1.8.2).
Thanks so much.
- Chris
> Not Extracting Text from PDF Document
> -------------------------------------
>
> Key: PDFBOX-1502
> URL: https://issues.apache.org/jira/browse/PDFBOX-1502
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0
> Environment: Mac OS , jdk 1.7
> Reporter: deepak
> Assignee: Andreas Lehmkühler
> Attachments: PDFBOX1502-RenewalAdvice.txt, Renewal Advice .pdf, Renewal_Advice_Edited.pdf, Renewal_Advice_Edited_Extracted_Text.txt
>
>
> PDDocument document = PDDocument.load(Inputstream);
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(document) is not returning some text content in the attached PDF Document . It is just returning the form fields but the values are empty . The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 codebase.
> Please help in resolving the issue
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)