You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Manuel Aristaran (JIRA)" <ji...@apache.org> on 2014/01/18 17:25:19 UTC

[jira] [Created] (PDFBOX-1853) Bad character mapping in text extraction

Manuel Aristaran created PDFBOX-1853:
----------------------------------------

             Summary: Bad character mapping in text extraction
                 Key: PDFBOX-1853
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1853
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.8.3
         Environment: Mac OS, Java 6
            Reporter: Manuel Aristaran
         Attachments: Anuario_de_Estadisticas_Universitarias_2010.pdf

PDFBox returns wrong characters for some of the embeded typographies in the attached PDF.

pdftohtml (poppler) and mudraw (mupdf) also show a similar issue. Problem does not present itself in Mac OS X Preview.app. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)