You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Sébastien Dailly <se...@elettermail.eu> on 2013/03/20 11:45:34 UTC

What's wrong with this font ?

Hello,

I've got a problem while reading the attached document. (It has been 
deflated, anonymised, text has been removed, and character shuffled).

The text extraction works fine with some pdf reader (I tried with 
Acrobat and Evince), but the text read by pdfbox is not the expected 
one, as if pdfbox is using a wrong font description for reading the text 
: instead of


> 60CO L4PU7L
 > 03D4 DR DVGWEWNER5L STLERC
> MLIPHOAP6 AE0TE

I've got

> UvIKGMuK6RuN0TN
> 0 E4RREDRRRElPéNéOND5vRRrTvNDp
> 60pMRRRv4KS7v


I'm using pdfbox 1.6.0 for that.

Is the document invalid ? What can I do for reading correctly the document ?

Thanks !

-- 
Sébastien Dailly
+33 1 56 29 78 67
ELETTERMAIL

Re: What's wrong with this font ?

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,


"Sébastien Dailly" <se...@elettermail.eu> hat am 20. März 2013 um
11:45 geschrieben:
> Hello,
>
> I've got a problem while reading the attached document. (It has been
> deflated, anonymised, text has been removed, and character shuffled).
>
> The text extraction works fine with some pdf reader (I tried with
> Acrobat and Evince), but the text read by pdfbox is not the expected
> one, as if pdfbox is using a wrong font description for reading the text
> : instead of
>
>
> > 60CO L4PU7L
>  > 03D4 DR DVGWEWNER5L STLERC
> > MLIPHOAP6 AE0TE
>
> I've got
>
> > UvIKGMuK6RuN0TN
> > 0 E4RREDRRRElPéNéOND5vRRrTvNDp
> > 60pMRRRv4KS7v
>
>
> I'm using pdfbox 1.6.0 for that.
Please update to a more recent version like 1.7.1. or wait some more days as the
release
process for the all new 1.8.0 version just started yesterday.

> Is the document invalid ? What can I do for reading correctly the document ?
If after upgrading to a more recent version the issue still persists create an
issue
on JIRA [1] and attach the pdf in question to it.

P.S.: Ensure that you are correctly subscribed to the mailing list [2] otherwise
you won't
get any answers.

> Thanks !
>
> --
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL

BR
Andreas Lehkühler
[1] https://issues.apache.org/jira/browse/PDFBOX
[2] http://pdfbox.apache.org/mail-lists.html

Re: What's wrong with this font ?

Posted by Sébastien Dailly <se...@elettermail.eu>.
Le 20/03/2013 11:57, Maruan Sahyoun a écrit :
> Hi,
>
> using the latest version of pdfbox (1.7.1) that's what I got
>
> MLIPHOAP6 AE0TE
> 03D4  DR   DVGWEWNER5L  STLERC
> 60CO   L4PU7L
>
> Please give it a try.
>

Thanks for answering so quickly.

Sorry for the noise, I should have begun with the last pdfbox version. 
I'll upgrade and run some tests with the new library.


-- 
Sébastien Dailly
+33 1 56 29 78 67
ELETTERMAIL

Re: What's wrong with this font ?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

using the latest version of pdfbox (1.7.1) that's what I got

MLIPHOAP6 AE0TE
03D4  DR   DVGWEWNER5L  STLERC
60CO   L4PU7L

Please give it a try.

Maruan Sahyoun


Am 20.03.2013 um 11:45 schrieb Sébastien Dailly <se...@elettermail.eu>:

> Hello,
> 
> I've got a problem while reading the attached document. (It has been deflated, anonymised, text has been removed, and character shuffled).
> 
> The text extraction works fine with some pdf reader (I tried with Acrobat and Evince), but the text read by pdfbox is not the expected one, as if pdfbox is using a wrong font description for reading the text : instead of
> 
> 
>> 60CO L4PU7L
> > 03D4 DR DVGWEWNER5L STLERC
>> MLIPHOAP6 AE0TE
> 
> I've got
> 
>> UvIKGMuK6RuN0TN
>> 0 E4RREDRRRElPéNéOND5vRRrTvNDp
>> 60pMRRRv4KS7v
> 
> 
> I'm using pdfbox 1.6.0 for that.
> 
> Is the document invalid ? What can I do for reading correctly the document ?
> 
> Thanks !
> 
> -- 
> Sébastien Dailly
> +33 1 56 29 78 67
> ELETTERMAIL
> <document.pdf>