You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Stefan Postema (JIRA)" <ji...@apache.org> on 2014/10/24 10:00:52 UTC
[jira] [Created] (PDFBOX-2451) Only gibberish extracted from
certain PDF files
Stefan Postema created PDFBOX-2451:
--------------------------------------
Summary: Only gibberish extracted from certain PDF files
Key: PDFBOX-2451
URL: https://issues.apache.org/jira/browse/PDFBOX-2451
Project: PDFBox
Issue Type: Bug
Reporter: Stefan Postema
I was told to report a bug here. There are problems with extracting text from PDF files in Dutch. The bug was reported in issue TIKA-1095 (https://issues.apache.org/jira/browse/TIKA-1095). The problem can be reproduced with the latest Tika version.
The extracted Text only shows gibberish (or in other cases question marks and incorrect characters).
It was suggested it could be a font problem. Could this be looked into?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)