You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Hesham G." <he...@gmail.com> on 2013/03/18 15:43:25 UTC

Uppercase letters are read in lowercase manner

Hello ,

I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html

For example :
- Word "Testing" is read as "testing"
- Word "Eve" is read as "eve"
- Word "Deuteronomy" is read as "deuteronomy"

Is there a reason for this ?


Best regards ,
Hesham

Re: Uppercase letters are read in lowercase manner

Posted by "Hesham G." <he...@gmail.com>.
Done.
I have reported this with a sampe file: https://issues.apache.org/jira/browse/PDFBOX-1552

Best regards ,
Hesham 


---------------------------------------------
Included message :

> Hi,
> 
> Am 23.03.2013 09:11, schrieb Hesham G.:
>> Andreas ,
>>
>> Thank you for your answer : )
>> Should I add this to Jira, or it is already out there ?
> Yes, please and don't forget to add a sample pdf.
> 
> TIA
> Andreas Lehmkühler
> 
>>
>>
>> Best regards ,
>> Hesham
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi,
>>>
>>> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>>>> Hi Hesham,
>>>>
>>>> the text in question is defined as marked content in the PDF and not as 'regular text'.
>>>> I think its wrongly handled/not fully supported (I don't know what the
>>> implementation status is)
>>>> in pdfbox (and some other apps I tested with) but is correctly handled in
>>> Adobe Reader.
>>> That's correct, the pdf uses marked content to replace a string (14.9.4
>>> Replacement Text of the PDF specs provides a simple example). And
>>> yes, PDFBox doesn't support it, yet.
>>>
>>>> Kind regards
>>>>
>>>> Maruan Sahyoun
>>>>
>>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>>>
>>>>> Andreas ,
>>>>>
>>>>> I apologize for this !
>>>>> Please download the PDF from here :
>>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>>>
>>>>>
>>>>> Best regards ,
>>>>> Hesham
>>>>>
>>>>> ---------------------------------------------
>>>>> Included message :
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>>> Hello ,
>>>>>>>
>>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>>>
>>>>>>> For example :
>>>>>>> - Word "Testing" is read as "testing"
>>>>>>> - Word "Eve" is read as "eve"
>>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>>>
>>>>>>> Is there a reason for this ?
>>>>>>>
>>>>>>>
>>>>>>> Best regards ,
>>>>>>> Hesham
>>>>>>
>>>>>>
>>>>>> BR
>>>>>> Andreas Lehmkühler
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
> 
>

Re: Uppercase letters are read in lowercase manner

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 23.03.2013 09:11, schrieb Hesham G.:
> Andreas ,
>
> Thank you for your answer : )
> Should I add this to Jira, or it is already out there ?
Yes, please and don't forget to add a sample pdf.

TIA
Andreas Lehmkühler

>
>
> Best regards ,
> Hesham
>
> ---------------------------------------------
> Included message :
>
>> Hi,
>>
>> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>>> Hi Hesham,
>>>
>>> the text in question is defined as marked content in the PDF and not as 'regular text'.
>>> I think its wrongly handled/not fully supported (I don't know what the
>> implementation status is)
>>> in pdfbox (and some other apps I tested with) but is correctly handled in
>> Adobe Reader.
>> That's correct, the pdf uses marked content to replace a string (14.9.4
>> Replacement Text of the PDF specs provides a simple example). And
>> yes, PDFBox doesn't support it, yet.
>>
>>> Kind regards
>>>
>>> Maruan Sahyoun
>>>
>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>>
>>>> Andreas ,
>>>>
>>>> I apologize for this !
>>>> Please download the PDF from here :
>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>>
>>>> ---------------------------------------------
>>>> Included message :
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>> Hello ,
>>>>>>
>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>>
>>>>>> For example :
>>>>>> - Word "Testing" is read as "testing"
>>>>>> - Word "Eve" is read as "eve"
>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>>
>>>>>> Is there a reason for this ?
>>>>>>
>>>>>>
>>>>>> Best regards ,
>>>>>> Hesham
>>>>>
>>>>>
>>>>> BR
>>>>> Andreas Lehmkühler
>>
>> BR
>> Andreas Lehmkühler
>>


Re: Uppercase letters are read in lowercase manner

Posted by "Hesham G." <he...@gmail.com>.
Andreas ,

Thank you for your answer : )
Should I add this to Jira, or it is already out there ?


Best regards ,
Hesham 

---------------------------------------------
Included message :

> Hi,
> 
> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>> Hi Hesham,
>>
>> the text in question is defined as marked content in the PDF and not as 'regular text'.
> > I think its wrongly handled/not fully supported (I don't know what the 
> implementation status is)
> > in pdfbox (and some other apps I tested with) but is correctly handled in 
> Adobe Reader.
> That's correct, the pdf uses marked content to replace a string (14.9.4 
> Replacement Text of the PDF specs provides a simple example). And
> yes, PDFBox doesn't support it, yet.
> 
>> Kind regards
>>
>> Maruan Sahyoun
>>
>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>
>>> Andreas ,
>>>
>>> I apologize for this !
>>> Please download the PDF from here :
>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>
>>>
>>> Best regards ,
>>> Hesham
>>>
>>> ---------------------------------------------
>>> Included message :
>>>
>>>> Hi,
>>>>
>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>> Hello ,
>>>>>
>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>
>>>>> For example :
>>>>> - Word "Testing" is read as "testing"
>>>>> - Word "Eve" is read as "eve"
>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>
>>>>> Is there a reason for this ?
>>>>>
>>>>>
>>>>> Best regards ,
>>>>> Hesham
>>>>
>>>>
>>>> BR
>>>> Andreas Lehmkühler
> 
> BR
> Andreas Lehmkühler
>

Re: Uppercase letters are read in lowercase manner

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
> Hi Hesham,
>
> the text in question is defined as marked content in the PDF and not as 'regular text'.
 > I think its wrongly handled/not fully supported (I don't know what the 
implementation status is)
 > in pdfbox (and some other apps I tested with) but is correctly handled in 
Adobe Reader.
That's correct, the pdf uses marked content to replace a string (14.9.4 
Replacement Text of the PDF specs provides a simple example). And
yes, PDFBox doesn't support it, yet.

> Kind regards
>
> Maruan Sahyoun
>
> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>
>> Andreas ,
>>
>> I apologize for this !
>> Please download the PDF from here :
>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>
>>
>> Best regards ,
>> Hesham
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi,
>>>
>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>> Hello ,
>>>>
>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>
>>>> For example :
>>>> - Word "Testing" is read as "testing"
>>>> - Word "Eve" is read as "eve"
>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>
>>>> Is there a reason for this ?
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>
>>>
>>> BR
>>> Andreas Lehmkühler

BR
Andreas Lehmkühler

Re: Uppercase letters are read in lowercase manner

Posted by "Hesham G." <he...@gmail.com>.
Maruan ,

Sorry, I did not mean to be rude .. Thank you for checking this  : )


Best regards ,
Hesham

---------------------------------------------
Included message :
> Hi Hesham,
> 
> I know my explanation is not a solution to the issue. But as you wrote '…. is there a reason for that?' I thought I'll provide the reason :-) 
> 
> BTW Mac preview has the same issue that pdfbox has - so at least we are not alone. 
> 
> Maruan Sahyoun
> 
> Am 21.03.2013 um 12:34 schrieb Hesham G. <he...@gmail.com>:
> 
>> Maruan ,
>> 
>> And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.
>> 
>> 
>> Best regards ,
>> Hesham
>> 
>> 
>> ---------------------------------------------
>> Included message :
>> 
>>> Hi Hesham,
>>> 
>>> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader. 
>>> 
>>> Kind regards
>>> 
>>> Maruan Sahyoun
>>> 
>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>> 
>>>> Andreas ,
>>>> 
>>>> I apologize for this !
>>>> Please download the PDF from here :
>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>> 
>>>> 
>>>> Best regards ,
>>>> Hesham
>>>> 
>>>> ---------------------------------------------
>>>> Included message :
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>> Hello ,
>>>>>> 
>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>> 
>>>>>> For example :
>>>>>> - Word "Testing" is read as "testing"
>>>>>> - Word "Eve" is read as "eve"
>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>> 
>>>>>> Is there a reason for this ?
>>>>>> 
>>>>>> 
>>>>>> Best regards ,
>>>>>> Hesham
>>>>> 
>>>>> 
>>>>> BR
>>>>> Andreas Lehmkühler
>>>>> 
>>> 
> 
>

Re: Uppercase letters are read in lowercase manner

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Hesham,

I know my explanation is not a solution to the issue. But as you wrote '…. is there a reason for that?' I thought I'll provide the reason :-) 

BTW Mac preview has the same issue that pdfbox has - so at least we are not alone. 

Maruan Sahyoun

Am 21.03.2013 um 12:34 schrieb Hesham G. <he...@gmail.com>:

> Maruan ,
> 
> And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.
> 
> 
> Best regards ,
> Hesham
> 
> 
> ---------------------------------------------
> Included message :
> 
>> Hi Hesham,
>> 
>> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader. 
>> 
>> Kind regards
>> 
>> Maruan Sahyoun
>> 
>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>> 
>>> Andreas ,
>>> 
>>> I apologize for this !
>>> Please download the PDF from here :
>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>> 
>>> 
>>> Best regards ,
>>> Hesham
>>> 
>>> ---------------------------------------------
>>> Included message :
>>> 
>>>> Hi,
>>>> 
>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>> Hello ,
>>>>> 
>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>> 
>>>>> For example :
>>>>> - Word "Testing" is read as "testing"
>>>>> - Word "Eve" is read as "eve"
>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>> 
>>>>> Is there a reason for this ?
>>>>> 
>>>>> 
>>>>> Best regards ,
>>>>> Hesham
>>>> 
>>>> 
>>>> BR
>>>> Andreas Lehmkühler
>>>> 
>> 


Re: Uppercase letters are read in lowercase manner

Posted by "Hesham G." <he...@gmail.com>.
Maruan ,

And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.


Best regards ,
Hesham


---------------------------------------------
Included message :

> Hi Hesham,
> 
> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader. 
> 
> Kind regards
> 
> Maruan Sahyoun
> 
> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
> 
>> Andreas ,
>> 
>> I apologize for this !
>> Please download the PDF from here :
>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>> 
>> 
>> Best regards ,
>> Hesham
>> 
>> ---------------------------------------------
>> Included message :
>> 
>>> Hi,
>>> 
>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>> Hello ,
>>>> 
>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>> 
>>>> For example :
>>>> - Word "Testing" is read as "testing"
>>>> - Word "Eve" is read as "eve"
>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>> 
>>>> Is there a reason for this ?
>>>> 
>>>> 
>>>> Best regards ,
>>>> Hesham
>>> 
>>> 
>>> BR
>>> Andreas Lehmkühler
>>> 
> 
>

Re: Uppercase letters are read in lowercase manner

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Hesham,

the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader. 

Kind regards

Maruan Sahyoun

Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:

> Andreas ,
> 
> I apologize for this !
> Please download the PDF from here :
> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
> 
> 
> Best regards ,
> Hesham
> 
> ---------------------------------------------
> Included message :
> 
>> Hi,
>> 
>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>> Hello ,
>>> 
>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>> 
>>> For example :
>>> - Word "Testing" is read as "testing"
>>> - Word "Eve" is read as "eve"
>>> - Word "Deuteronomy" is read as "deuteronomy"
>>> 
>>> Is there a reason for this ?
>>> 
>>> 
>>> Best regards ,
>>> Hesham
>> 
>> 
>> BR
>> Andreas Lehmkühler
>> 


Re: Uppercase letters are read in lowercase manner

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 21.03.2013 07:05, schrieb Hesham G.:
> Andreas ,
>
> I apologize for this !
> Please download the PDF from here :
> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
No need to worry, everything is fine now

> Best regards ,
> Hesham
>
> ---------------------------------------------
> Included message :
>
>> Hi,
>>
>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>> Hello ,
>>>
>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>
>>> For example :
>>> - Word "Testing" is read as "testing"
>>> - Word "Eve" is read as "eve"
>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>
>>> Is there a reason for this ?
>>>
>>>
>>> Best regards ,
>>> Hesham
>>
>>
>> BR
>> Andreas Lehmkühler

BR
Andreas Lehmkühler


Re: Uppercase letters are read in lowercase manner

Posted by "Hesham G." <he...@gmail.com>.
Andreas ,

I apologize for this !
Please download the PDF from here :
https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf


Best regards ,
Hesham

---------------------------------------------
Included message :

> Hi,
> 
> Am 18.03.2013 15:43, schrieb Hesham G.:
>> Hello ,
>>
>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
> Do I have to sign up to download the pdf or did I miss the "magic" download button?
> 
>> For example :
>> - Word "Testing" is read as "testing"
>> - Word "Eve" is read as "eve"
>> - Word "Deuteronomy" is read as "deuteronomy"
>>
>> Is there a reason for this ?
>>
>>
>> Best regards ,
>> Hesham
> 
> 
> BR
> Andreas Lehmkühler
> 
>

Re: Uppercase letters are read in lowercase manner

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 18.03.2013 15:43, schrieb Hesham G.:
> Hello ,
>
> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
Do I have to sign up to download the pdf or did I miss the "magic" download button?

> For example :
> - Word "Testing" is read as "testing"
> - Word "Eve" is read as "eve"
> - Word "Deuteronomy" is read as "deuteronomy"
>
> Is there a reason for this ?
>
>
> Best regards ,
> Hesham


BR
Andreas Lehmkühler