You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Sébastien Dailly <se...@elettermail.eu> on 2013/03/20 11:45:34 UTC
What's wrong with this font ?
Hello,
I've got a problem while reading the attached document. (It has been
deflated, anonymised, text has been removed, and character shuffled).
The text extraction works fine with some pdf reader (I tried with
Acrobat and Evince), but the text read by pdfbox is not the expected
one, as if pdfbox is using a wrong font description for reading the text
: instead of
> 60CO L4PU7L
> 03D4 DR DVGWEWNER5L STLERC
> MLIPHOAP6 AE0TE
I've got
> UvIKGMuK6RuN0TN
> 0 E4RREDRRRElPéNéOND5vRRrTvNDp
> 60pMRRRv4KS7v
I'm using pdfbox 1.6.0 for that.
Is the document invalid ? What can I do for reading correctly the document ?
Thanks !
--
Sébastien Dailly
+33 1 56 29 78 67
ELETTERMAIL
Re: What's wrong with this font ?
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
"Sébastien Dailly" <se...@elettermail.eu> hat am 20. März 2013 um
11:45 geschrieben:
> Hello,
>
> I've got a problem while reading the attached document. (It has been
> deflated, anonymised, text has been removed, and character shuffled).
>
> The text extraction works fine with some pdf reader (I tried with
> Acrobat and Evince), but the text read by pdfbox is not the expected
> one, as if pdfbox is using a wrong font description for reading the text
> : instead of
>
>
> > 60CO L4PU7L
> > 03D4 DR DVGWEWNER5L STLERC
> > MLIPHOAP6 AE0TE
>
> I've got
>
> > UvIKGMuK6RuN0TN
> > 0 E4RREDRRRElPéNéOND5vRRrTvNDp
> > 60pMRRRv4KS7v
>
>
> I'm using pdfbox 1.6.0 for that.
Please update to a more recent version like 1.7.1. or wait some more days as the
release
process for the all new 1.8.0 version just started yesterday.
> Is the document invalid ? What can I do for reading correctly the document ?
If after upgrading to a more recent version the issue still persists create an
issue
on JIRA [1] and attach the pdf in question to it.
P.S.: Ensure that you are correctly subscribed to the mailing list [2] otherwise
you won't
get any answers.
> Thanks !
>
> --
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL
BR
Andreas Lehkühler
[1] https://issues.apache.org/jira/browse/PDFBOX
[2] http://pdfbox.apache.org/mail-lists.html
Re: What's wrong with this font ?
Posted by Sébastien Dailly <se...@elettermail.eu>.
Le 20/03/2013 11:57, Maruan Sahyoun a écrit :
> Hi,
>
> using the latest version of pdfbox (1.7.1) that's what I got
>
> MLIPHOAP6 AE0TE
> 03D4 DR DVGWEWNER5L STLERC
> 60CO L4PU7L
>
> Please give it a try.
>
Thanks for answering so quickly.
Sorry for the noise, I should have begun with the last pdfbox version.
I'll upgrade and run some tests with the new library.
--
Sébastien Dailly
+33 1 56 29 78 67
ELETTERMAIL
Re: What's wrong with this font ?
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,
using the latest version of pdfbox (1.7.1) that's what I got
MLIPHOAP6 AE0TE
03D4 DR DVGWEWNER5L STLERC
60CO L4PU7L
Please give it a try.
Maruan Sahyoun
Am 20.03.2013 um 11:45 schrieb Sébastien Dailly <se...@elettermail.eu>:
> Hello,
>
> I've got a problem while reading the attached document. (It has been deflated, anonymised, text has been removed, and character shuffled).
>
> The text extraction works fine with some pdf reader (I tried with Acrobat and Evince), but the text read by pdfbox is not the expected one, as if pdfbox is using a wrong font description for reading the text : instead of
>
>
>> 60CO L4PU7L
> > 03D4 DR DVGWEWNER5L STLERC
>> MLIPHOAP6 AE0TE
>
> I've got
>
>> UvIKGMuK6RuN0TN
>> 0 E4RREDRRRElPéNéOND5vRRrTvNDp
>> 60pMRRRv4KS7v
>
>
> I'm using pdfbox 1.6.0 for that.
>
> Is the document invalid ? What can I do for reading correctly the document ?
>
> Thanks !
>
> --
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL
> <document.pdf>