You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Hesham (JIRA)" <ji...@apache.org> on 2013/03/26 14:04:14 UTC
[jira] [Created] (PDFBOX-1552) Uppercase letters are read in
lowercase manner
Hesham created PDFBOX-1552:
------------------------------
Summary: Uppercase letters are read in lowercase manner
Key: PDFBOX-1552
URL: https://issues.apache.org/jira/browse/PDFBOX-1552
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.7.1
Environment: Windows XP
Reporter: Hesham
Attachments: pdf_with_uppercase_letters.pdf
I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. For example :
- Word "Testing" is read as "testing"
- Word "Eve" is read as "eve"
- Word "Deuteronomy" is read as "deuteronomy"
Andreas commented on this by: "The pdf uses marked content to replace a string (14.9.4 Replacement Text of the PDF specs provides a simple example). And yes, PDFBox doesn't support it, yet."
Please check this 1-page sample PDF.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira