You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/05/06 21:22:19 UTC

[jira] [Closed] (PDFBOX-1410) Error while converting pdf version 1.3 to text

     [ https://issues.apache.org/jira/browse/PDFBOX-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-1410.
--------------------------------------

    Resolution: Not A Problem
      Assignee: Andreas Lehmkühler

I'm afraid the text can't be extracted as the pdf doesn't provide any readable mapping. Even the adobe test fails http://pdfbox.apache.org/userguide/faq.html#no_text_extraction
                
> Error while converting pdf version 1.3 to text
> ----------------------------------------------
>
>                 Key: PDFBOX-1410
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1410
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDFReader
>    Affects Versions: 1.7.1
>            Reporter: Avinash Jadhav
>            Assignee: Andreas Lehmkühler
>         Attachments: 321.pdf
>
>
> I am getting error when trying to extract text from PDF 1.3 using command line ExtractText.
> SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
> I can extract from PDF 1.5
> Is this issue fixed in PDFBOX ? which version?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira