You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Darcy Dechene <da...@gmail.com> on 2013/08/08 16:58:47 UTC

PDFToImage results in missing text

Hi,

When converting the attached pdf file using the command line tool
PDFToImage the resulting image is missing all the text (attached
test_J25.jpg).

This is being run on a Windows system and occurs when using Java 7 (update
21 or 25). When running on the same Windows system using Java 7 update 11
the resulting image is just fine and contains all the text (attached
test_J11.jpg).

Does anyone know what might cause this or has ideas for further debugging?

Thanks!
Darcy

Re: PDFToImage results in missing text

Posted by Darcy Dechene <da...@gmail.com>.

Yes that is the issue. I'll monitor PDFBOX-1608 and for now we will roll
back to Java pre 1.7.0_21

Thanks for the quick response.

Darcy


On Thu, Aug 8, 2013 at 10:05 AM, Andreas Lehmkuehler <an...@lehmi.de>wrote:

> Hi,
>
> Am 08.08.2013 16:58, schrieb Darcy Dechene:
>
>  Hi,
>>
>> When converting the attached pdf file using the command line tool
>> PDFToImage the
>> resulting image is missing all the text (attached test_J25.jpg).
>>
> Your attachment didn't make it due to some restrictions to the mailing
> list.
> But I guess we won't need an image demonstrating that something is
> missing. ;-)
>
>
>  This is being run on a Windows system and occurs when using Java 7
>> (update 21 or
>> 25). When running on the same Windows system using Java 7 update 11 the
>> resulting image is just fine and contains all the text (attached
>> test_J11.jpg).
>>
> Sounds like your pdf uses type1C/CFF fonts. That's a known issue and
> PDFBOX-1608 [1] deals with it.
>
>
>  Does anyone know what might cause this or has ideas for further debugging?
>>
> You may double check if I'm right. Open the pdf in question using acrobat
> and
> have a look at the documents properties (File -> Properites). The font
> folder
> lists all used fonts. You may see one or more entries for an embedded
> subset of
> a type1 font.
>
>  Thanks!
>> Darcy
>>
>
> BR
> Andreas Lehmkühler
> [1] https://issues.apache.org/**jira/browse/PDFBOX-1608<https://issues.apache.org/jira/browse/PDFBOX-1608>
>

Re: PDFToImage results in missing text

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Hi,

Am 08.08.2013 16:58, schrieb Darcy Dechene:
> Hi,
>
> When converting the attached pdf file using the command line tool PDFToImage the
> resulting image is missing all the text (attached test_J25.jpg).
Your attachment didn't make it due to some restrictions to the mailing list.
But I guess we won't need an image demonstrating that something is missing. ;-)

> This is being run on a Windows system and occurs when using Java 7 (update 21 or
> 25). When running on the same Windows system using Java 7 update 11 the
> resulting image is just fine and contains all the text (attached test_J11.jpg).
Sounds like your pdf uses type1C/CFF fonts. That's a known issue and
PDFBOX-1608 [1] deals with it.

> Does anyone know what might cause this or has ideas for further debugging?
You may double check if I'm right. Open the pdf in question using acrobat and
have a look at the documents properties (File -> Properites). The font folder
lists all used fonts. You may see one or more entries for an embedded subset of
a type1 font.

> Thanks!
> Darcy

BR
Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-1608