You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/05/18 21:05:42 UTC

[jira] Commented: (PDFBOX-586) Text Extraction Regression ?

    [ https://issues.apache.org/jira/browse/PDFBOX-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868790#action_12868790 ] 

Andreas Lehmkühler commented on PDFBOX-586:
-------------------------------------------

Works like a charm with 1.1.0. (using ExtractText -sort -encoding utf-8). Find my results attached to this issue

What exactly goes wrong when you try to extract the text? Do you get any exception? What are the differences between the older 0.7.4 results and those produced with the more recent version of pdfbox?

> Text Extraction Regression ?
> ----------------------------
>
>                 Key: PDFBOX-586
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-586
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0
>         Environment: Windows XP + Eclipse + PDFBox sources
>            Reporter: Bernard
>         Attachments: ASEB-Camping_Car_ou_Bateau.pdf, Eval.pdf, internals.pdf, PDFBOX586-ASEB-Camping_Car_ou_Bateau.txt, PDFBOX586-Eval.txt, PDFBOX586-internals.txt
>
>
> Hi,
> I have noticed that I can extract text some PDF files in PDFBox 0.7.4 but for the same file, the same page, PDFBox 1.1.0 doesn't retreive any text, or the extraction is worst.
> Am I the only only one who think there is a regression in text extraction ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.