You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Stefan Postema (JIRA)" <ji...@apache.org> on 2014/10/24 10:00:52 UTC

[jira] [Created] (PDFBOX-2451) Only gibberish extracted from certain PDF files

Stefan Postema created PDFBOX-2451:
--------------------------------------

             Summary: Only gibberish extracted from certain PDF files
                 Key: PDFBOX-2451
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2451
             Project: PDFBox
          Issue Type: Bug
            Reporter: Stefan Postema


I was told to report a bug here. There are problems with extracting text from PDF files in Dutch. The bug was reported in issue TIKA-1095 (https://issues.apache.org/jira/browse/TIKA-1095). The problem can be reproduced with the latest Tika version.

The extracted Text only shows gibberish (or in other cases question marks and incorrect characters).

It was suggested it could be a font problem. Could this be looked into?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)