You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (JIRA)" <ji...@apache.org> on 2013/04/06 17:53:15 UTC

[jira] [Closed] (PDFBOX-294) Text extraction gives incorrect results for attached PDF

     [ https://issues.apache.org/jira/browse/PDFBOX-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maruan Sahyoun closed PDFBOX-294.
---------------------------------

    Resolution: Not A Problem

trying to extract the text using Adobe Reader gives the same results as using ExtractText. The text can not be exported in a meaningful way.
                
> Text extraction gives incorrect results for attached PDF
> --------------------------------------------------------
>
>                 Key: PDFBOX-294
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-294
>             Project: PDFBox
>          Issue Type: Bug
>            Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1768715
> Originally submitted by dougcook on 2007-08-06 10:16.
> The attached PDF does not render correctly in 0.7.3 -- extracted text is simply a bunch of garbage characters.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1768715&file_id=240109
> 11.pdf (application/pdf), 99627 bytes
> Example for extraction problems
> [comment on SourceForge]
> Originally sent by dougcook.
> Logged In: YES 
> user_id=1851816
> Originator: YES
> Yes, text extraction fails, and I can't find snippets from this doc in either Google or Yahoo!, meaning that their text extraction libraries probably also fail on this doc.
> Yet it renders perfectly visibly -- not sure if this is simply a limitation of all the existing text extraction algorithms or that the doc is constructed in some way which makes text extraction impossible.
> [comment on SourceForge]
> Originally sent by carlemac_2007.
> Logged In: YES 
> user_id=1933815
> Originator: NO
> Try the same file with free Adobe Acrobat Reader 8.1.1
> 1. Save as text - the resulting text file has "unprintable" characters also.  
> A PDF construction error?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira