You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Hesham (JIRA)" <ji...@apache.org> on 2013/03/26 14:04:14 UTC

[jira] [Created] (PDFBOX-1552) Uppercase letters are read in lowercase manner

Hesham created PDFBOX-1552:
------------------------------

             Summary: Uppercase letters are read in lowercase manner
                 Key: PDFBOX-1552
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1552
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.7.1
         Environment: Windows XP
            Reporter: Hesham
         Attachments: pdf_with_uppercase_letters.pdf

I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. For example :
- Word "Testing" is read as "testing"
- Word "Eve" is read as "eve"
- Word "Deuteronomy" is read as "deuteronomy"

Andreas commented on this by: "The pdf uses marked content to replace a string (14.9.4 Replacement Text of the PDF specs provides a simple example). And yes, PDFBox doesn't support it, yet."


Please check this 1-page sample PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira