You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Hesham G." <he...@gmail.com> on 2013/03/18 15:43:25 UTC
Uppercase letters are read in lowercase manner
Hello ,
I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
For example :
- Word "Testing" is read as "testing"
- Word "Eve" is read as "eve"
- Word "Deuteronomy" is read as "deuteronomy"
Is there a reason for this ?
Best regards ,
Hesham
Re: Uppercase letters are read in lowercase manner
Posted by "Hesham G." <he...@gmail.com>.
Done.
I have reported this with a sampe file: https://issues.apache.org/jira/browse/PDFBOX-1552
Best regards ,
Hesham
---------------------------------------------
Included message :
> Hi,
>
> Am 23.03.2013 09:11, schrieb Hesham G.:
>> Andreas ,
>>
>> Thank you for your answer : )
>> Should I add this to Jira, or it is already out there ?
> Yes, please and don't forget to add a sample pdf.
>
> TIA
> Andreas Lehmkühler
>
>>
>>
>> Best regards ,
>> Hesham
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi,
>>>
>>> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>>>> Hi Hesham,
>>>>
>>>> the text in question is defined as marked content in the PDF and not as 'regular text'.
>>>> I think its wrongly handled/not fully supported (I don't know what the
>>> implementation status is)
>>>> in pdfbox (and some other apps I tested with) but is correctly handled in
>>> Adobe Reader.
>>> That's correct, the pdf uses marked content to replace a string (14.9.4
>>> Replacement Text of the PDF specs provides a simple example). And
>>> yes, PDFBox doesn't support it, yet.
>>>
>>>> Kind regards
>>>>
>>>> Maruan Sahyoun
>>>>
>>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>>>
>>>>> Andreas ,
>>>>>
>>>>> I apologize for this !
>>>>> Please download the PDF from here :
>>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>>>
>>>>>
>>>>> Best regards ,
>>>>> Hesham
>>>>>
>>>>> ---------------------------------------------
>>>>> Included message :
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>>> Hello ,
>>>>>>>
>>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>>>
>>>>>>> For example :
>>>>>>> - Word "Testing" is read as "testing"
>>>>>>> - Word "Eve" is read as "eve"
>>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>>>
>>>>>>> Is there a reason for this ?
>>>>>>>
>>>>>>>
>>>>>>> Best regards ,
>>>>>>> Hesham
>>>>>>
>>>>>>
>>>>>> BR
>>>>>> Andreas Lehmkühler
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
>
>
Re: Uppercase letters are read in lowercase manner
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 23.03.2013 09:11, schrieb Hesham G.:
> Andreas ,
>
> Thank you for your answer : )
> Should I add this to Jira, or it is already out there ?
Yes, please and don't forget to add a sample pdf.
TIA
Andreas Lehmkühler
>
>
> Best regards ,
> Hesham
>
> ---------------------------------------------
> Included message :
>
>> Hi,
>>
>> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>>> Hi Hesham,
>>>
>>> the text in question is defined as marked content in the PDF and not as 'regular text'.
>>> I think its wrongly handled/not fully supported (I don't know what the
>> implementation status is)
>>> in pdfbox (and some other apps I tested with) but is correctly handled in
>> Adobe Reader.
>> That's correct, the pdf uses marked content to replace a string (14.9.4
>> Replacement Text of the PDF specs provides a simple example). And
>> yes, PDFBox doesn't support it, yet.
>>
>>> Kind regards
>>>
>>> Maruan Sahyoun
>>>
>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>>
>>>> Andreas ,
>>>>
>>>> I apologize for this !
>>>> Please download the PDF from here :
>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>>
>>>> ---------------------------------------------
>>>> Included message :
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>> Hello ,
>>>>>>
>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>>
>>>>>> For example :
>>>>>> - Word "Testing" is read as "testing"
>>>>>> - Word "Eve" is read as "eve"
>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>>
>>>>>> Is there a reason for this ?
>>>>>>
>>>>>>
>>>>>> Best regards ,
>>>>>> Hesham
>>>>>
>>>>>
>>>>> BR
>>>>> Andreas Lehmkühler
>>
>> BR
>> Andreas Lehmkühler
>>
Re: Uppercase letters are read in lowercase manner
Posted by "Hesham G." <he...@gmail.com>.
Andreas ,
Thank you for your answer : )
Should I add this to Jira, or it is already out there ?
Best regards ,
Hesham
---------------------------------------------
Included message :
> Hi,
>
> Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
>> Hi Hesham,
>>
>> the text in question is defined as marked content in the PDF and not as 'regular text'.
> > I think its wrongly handled/not fully supported (I don't know what the
> implementation status is)
> > in pdfbox (and some other apps I tested with) but is correctly handled in
> Adobe Reader.
> That's correct, the pdf uses marked content to replace a string (14.9.4
> Replacement Text of the PDF specs provides a simple example). And
> yes, PDFBox doesn't support it, yet.
>
>> Kind regards
>>
>> Maruan Sahyoun
>>
>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>
>>> Andreas ,
>>>
>>> I apologize for this !
>>> Please download the PDF from here :
>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>
>>>
>>> Best regards ,
>>> Hesham
>>>
>>> ---------------------------------------------
>>> Included message :
>>>
>>>> Hi,
>>>>
>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>> Hello ,
>>>>>
>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>
>>>>> For example :
>>>>> - Word "Testing" is read as "testing"
>>>>> - Word "Eve" is read as "eve"
>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>
>>>>> Is there a reason for this ?
>>>>>
>>>>>
>>>>> Best regards ,
>>>>> Hesham
>>>>
>>>>
>>>> BR
>>>> Andreas Lehmkühler
>
> BR
> Andreas Lehmkühler
>
Re: Uppercase letters are read in lowercase manner
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 21.03.2013 08:08, schrieb Maruan Sahyoun:
> Hi Hesham,
>
> the text in question is defined as marked content in the PDF and not as 'regular text'.
> I think its wrongly handled/not fully supported (I don't know what the
implementation status is)
> in pdfbox (and some other apps I tested with) but is correctly handled in
Adobe Reader.
That's correct, the pdf uses marked content to replace a string (14.9.4
Replacement Text of the PDF specs provides a simple example). And
yes, PDFBox doesn't support it, yet.
> Kind regards
>
> Maruan Sahyoun
>
> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>
>> Andreas ,
>>
>> I apologize for this !
>> Please download the PDF from here :
>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>
>>
>> Best regards ,
>> Hesham
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi,
>>>
>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>> Hello ,
>>>>
>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>
>>>> For example :
>>>> - Word "Testing" is read as "testing"
>>>> - Word "Eve" is read as "eve"
>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>
>>>> Is there a reason for this ?
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>
>>>
>>> BR
>>> Andreas Lehmkühler
BR
Andreas Lehmkühler
Re: Uppercase letters are read in lowercase manner
Posted by "Hesham G." <he...@gmail.com>.
Maruan ,
Sorry, I did not mean to be rude .. Thank you for checking this : )
Best regards ,
Hesham
---------------------------------------------
Included message :
> Hi Hesham,
>
> I know my explanation is not a solution to the issue. But as you wrote '…. is there a reason for that?' I thought I'll provide the reason :-)
>
> BTW Mac preview has the same issue that pdfbox has - so at least we are not alone.
>
> Maruan Sahyoun
>
> Am 21.03.2013 um 12:34 schrieb Hesham G. <he...@gmail.com>:
>
>> Maruan ,
>>
>> And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.
>>
>>
>> Best regards ,
>> Hesham
>>
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi Hesham,
>>>
>>> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader.
>>>
>>> Kind regards
>>>
>>> Maruan Sahyoun
>>>
>>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>>
>>>> Andreas ,
>>>>
>>>> I apologize for this !
>>>> Please download the PDF from here :
>>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>>
>>>> ---------------------------------------------
>>>> Included message :
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>>> Hello ,
>>>>>>
>>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>>
>>>>>> For example :
>>>>>> - Word "Testing" is read as "testing"
>>>>>> - Word "Eve" is read as "eve"
>>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>>
>>>>>> Is there a reason for this ?
>>>>>>
>>>>>>
>>>>>> Best regards ,
>>>>>> Hesham
>>>>>
>>>>>
>>>>> BR
>>>>> Andreas Lehmkühler
>>>>>
>>>
>
>
Re: Uppercase letters are read in lowercase manner
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Hesham,
I know my explanation is not a solution to the issue. But as you wrote '…. is there a reason for that?' I thought I'll provide the reason :-)
BTW Mac preview has the same issue that pdfbox has - so at least we are not alone.
Maruan Sahyoun
Am 21.03.2013 um 12:34 schrieb Hesham G. <he...@gmail.com>:
> Maruan ,
>
> And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.
>
>
> Best regards ,
> Hesham
>
>
> ---------------------------------------------
> Included message :
>
>> Hi Hesham,
>>
>> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader.
>>
>> Kind regards
>>
>> Maruan Sahyoun
>>
>> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>>
>>> Andreas ,
>>>
>>> I apologize for this !
>>> Please download the PDF from here :
>>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>>
>>>
>>> Best regards ,
>>> Hesham
>>>
>>> ---------------------------------------------
>>> Included message :
>>>
>>>> Hi,
>>>>
>>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>>> Hello ,
>>>>>
>>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>>
>>>>> For example :
>>>>> - Word "Testing" is read as "testing"
>>>>> - Word "Eve" is read as "eve"
>>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>>
>>>>> Is there a reason for this ?
>>>>>
>>>>>
>>>>> Best regards ,
>>>>> Hesham
>>>>
>>>>
>>>> BR
>>>> Andreas Lehmkühler
>>>>
>>
Re: Uppercase letters are read in lowercase manner
Posted by "Hesham G." <he...@gmail.com>.
Maruan ,
And that is why I have sent this question. The text appears fine in Adobe reader. I can copy/paste it with the mouse resulting the right case sensitivity as it appears in the file, but when using PDFBox it returns lowercase letters.
Best regards ,
Hesham
---------------------------------------------
Included message :
> Hi Hesham,
>
> the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader.
>
> Kind regards
>
> Maruan Sahyoun
>
> Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
>
>> Andreas ,
>>
>> I apologize for this !
>> Please download the PDF from here :
>> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>>
>>
>> Best regards ,
>> Hesham
>>
>> ---------------------------------------------
>> Included message :
>>
>>> Hi,
>>>
>>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>>> Hello ,
>>>>
>>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>>
>>>> For example :
>>>> - Word "Testing" is read as "testing"
>>>> - Word "Eve" is read as "eve"
>>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>>
>>>> Is there a reason for this ?
>>>>
>>>>
>>>> Best regards ,
>>>> Hesham
>>>
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
>
>
Re: Uppercase letters are read in lowercase manner
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Hesham,
the text in question is defined as marked content in the PDF and not as 'regular text'. I think its wrongly handled/not fully supported (I don't know what the implementation status is) in pdfbox (and some other apps I tested with) but is correctly handled in Adobe Reader.
Kind regards
Maruan Sahyoun
Am 21.03.2013 um 07:05 schrieb Hesham G. <he...@gmail.com>:
> Andreas ,
>
> I apologize for this !
> Please download the PDF from here :
> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
>
>
> Best regards ,
> Hesham
>
> ---------------------------------------------
> Included message :
>
>> Hi,
>>
>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>> Hello ,
>>>
>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>
>>> For example :
>>> - Word "Testing" is read as "testing"
>>> - Word "Eve" is read as "eve"
>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>
>>> Is there a reason for this ?
>>>
>>>
>>> Best regards ,
>>> Hesham
>>
>>
>> BR
>> Andreas Lehmkühler
>>
Re: Uppercase letters are read in lowercase manner
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 21.03.2013 07:05, schrieb Hesham G.:
> Andreas ,
>
> I apologize for this !
> Please download the PDF from here :
> https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
No need to worry, everything is fine now
> Best regards ,
> Hesham
>
> ---------------------------------------------
> Included message :
>
>> Hi,
>>
>> Am 18.03.2013 15:43, schrieb Hesham G.:
>>> Hello ,
>>>
>>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
>> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>>
>>> For example :
>>> - Word "Testing" is read as "testing"
>>> - Word "Eve" is read as "eve"
>>> - Word "Deuteronomy" is read as "deuteronomy"
>>>
>>> Is there a reason for this ?
>>>
>>>
>>> Best regards ,
>>> Hesham
>>
>>
>> BR
>> Andreas Lehmkühler
BR
Andreas Lehmkühler
Re: Uppercase letters are read in lowercase manner
Posted by "Hesham G." <he...@gmail.com>.
Andreas ,
I apologize for this !
Please download the PDF from here :
https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
Best regards ,
Hesham
---------------------------------------------
Included message :
> Hi,
>
> Am 18.03.2013 15:43, schrieb Hesham G.:
>> Hello ,
>>
>> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
>> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
> Do I have to sign up to download the pdf or did I miss the "magic" download button?
>
>> For example :
>> - Word "Testing" is read as "testing"
>> - Word "Eve" is read as "eve"
>> - Word "Deuteronomy" is read as "deuteronomy"
>>
>> Is there a reason for this ?
>>
>>
>> Best regards ,
>> Hesham
>
>
> BR
> Andreas Lehmkühler
>
>
Re: Uppercase letters are read in lowercase manner
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 18.03.2013 15:43, schrieb Hesham G.:
> Hello ,
>
> I have a PDF that when I read its contents using PDFBox some uppercase letters are being read as lowercase. Please check this 1-page sample PDF :
> http://www.4shared.com/office/JXrLadN8/pdf_with_uppercase_letters.html
Do I have to sign up to download the pdf or did I miss the "magic" download button?
> For example :
> - Word "Testing" is read as "testing"
> - Word "Eve" is read as "eve"
> - Word "Deuteronomy" is read as "deuteronomy"
>
> Is there a reason for this ?
>
>
> Best regards ,
> Hesham
BR
Andreas Lehmkühler