You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Juan M Uys <op...@gmail.com> on 2015/02/25 15:12:05 UTC

font errors when reading PDF (not writing)

Hello,

I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
these in the log:

Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
getTrueTypeFallbackFont
SEVERE: No TTF fallback font for 'Helvetica'
Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
WARNING: Using fallback font 'LiberationSans' for 'ArialMT'

I've searched the documentation for font-related advice, which seems to
pertain to WRITING PDFs, whereas I'm merely extracting text.

Please let me know how to get around this problem.

Do I need to install extra font packages?
If so, how? Where from?

At the very least, I'd like to know how to remove these statements from my
log. (I've tried throwing logback.xml and log4j.properties into my
resources folder, setting package org.apache.pdfbox to INFO, to no avail)

The system running my extractor code is stock Ubuntu 14.04 with Azul
openjdk 7 (see
https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)

Thanks,
Juan

Re: font errors when reading PDF (not writing)

Posted by John Hewson <jo...@jahewson.com>.
Yep, it’ll work fine. If you’re using AES 256 you’ll need the "Java unlimited security” files installed with your JVM.

— John

> On 26 Feb 2015, at 12:35, Steve Antoch <SA...@Yuzu.com> wrote:
> 
> 
> John-
> 
> (sorry to hijack -  I think this is related enough that it warrants asking here)
> 
> 
> If we run pdfbox on a headless server, will the Encrypt() class still function properly?  We do not render anything, just encrypt the document.
> 
> My suspicion is that this is not an issue, though it would be nice to be sure.
> 
> Thanks-
> Steve
> 
> ________________________________________
> From: John Hewson <jo...@jahewson.com>
> Sent: Wednesday, February 25, 2015 1:27 PM
> To: users@pdfbox.apache.org
> Subject: Re: font errors when reading PDF (not writing)
> 
> Are you running on a headless system, such as a server? If so, you probably don’t have any fonts installed. Even though you’re just doing text extraction, this matters because the dimensions of the characters need to be taken into account and many PDFs do not embed the fonts which they depend on.
> 
> At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft fonts are available in your distribution’s package manager.
> 
> — John
> 
>> On 25 Feb 2015, at 06:12, Juan M Uys <op...@gmail.com> wrote:
>> 
>> Hello,
>> 
>> I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
>> these in the log:
>> 
>> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
>> getTrueTypeFallbackFont
>> SEVERE: No TTF fallback font for 'Helvetica'
>> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
>> WARNING: Using fallback font 'LiberationSans' for 'ArialMT'
>> 
>> I've searched the documentation for font-related advice, which seems to
>> pertain to WRITING PDFs, whereas I'm merely extracting text.
>> 
>> Please let me know how to get around this problem.
>> 
>> Do I need to install extra font packages?
>> If so, how? Where from?
>> 
>> At the very least, I'd like to know how to remove these statements from my
>> log. (I've tried throwing logback.xml and log4j.properties into my
>> resources folder, setting package org.apache.pdfbox to INFO, to no avail)
>> 
>> The system running my extractor code is stock Ubuntu 14.04 with Azul
>> openjdk 7 (see
>> https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)
>> 
>> Thanks,
>> Juan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


Re: font errors when reading PDF (not writing)

Posted by Steve Antoch <SA...@Yuzu.com>.
John-

(sorry to hijack -  I think this is related enough that it warrants asking here)


If we run pdfbox on a headless server, will the Encrypt() class still function properly?  We do not render anything, just encrypt the document.

My suspicion is that this is not an issue, though it would be nice to be sure.

Thanks-
Steve

________________________________________
From: John Hewson <jo...@jahewson.com>
Sent: Wednesday, February 25, 2015 1:27 PM
To: users@pdfbox.apache.org
Subject: Re: font errors when reading PDF (not writing)

Are you running on a headless system, such as a server? If so, you probably don’t have any fonts installed. Even though you’re just doing text extraction, this matters because the dimensions of the characters need to be taken into account and many PDFs do not embed the fonts which they depend on.

At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft fonts are available in your distribution’s package manager.

— John

> On 25 Feb 2015, at 06:12, Juan M Uys <op...@gmail.com> wrote:
>
> Hello,
>
> I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
> these in the log:
>
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
> getTrueTypeFallbackFont
> SEVERE: No TTF fallback font for 'Helvetica'
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
> WARNING: Using fallback font 'LiberationSans' for 'ArialMT'
>
> I've searched the documentation for font-related advice, which seems to
> pertain to WRITING PDFs, whereas I'm merely extracting text.
>
> Please let me know how to get around this problem.
>
> Do I need to install extra font packages?
> If so, how? Where from?
>
> At the very least, I'd like to know how to remove these statements from my
> log. (I've tried throwing logback.xml and log4j.properties into my
> resources folder, setting package org.apache.pdfbox to INFO, to no avail)
>
> The system running my extractor code is stock Ubuntu 14.04 with Azul
> openjdk 7 (see
> https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)
>
> Thanks,
> Juan


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: font errors when reading PDF (not writing)

Posted by John Hewson <jo...@jahewson.com>.
Are you running on a headless system, such as a server? If so, you probably don’t have any fonts installed. Even though you’re just doing text extraction, this matters because the dimensions of the characters need to be taken into account and many PDFs do not embed the fonts which they depend on.

At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft fonts are available in your distribution’s package manager.

— John

> On 25 Feb 2015, at 06:12, Juan M Uys <op...@gmail.com> wrote:
> 
> Hello,
> 
> I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
> these in the log:
> 
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
> getTrueTypeFallbackFont
> SEVERE: No TTF fallback font for 'Helvetica'
> Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
> WARNING: Using fallback font 'LiberationSans' for 'ArialMT'
> 
> I've searched the documentation for font-related advice, which seems to
> pertain to WRITING PDFs, whereas I'm merely extracting text.
> 
> Please let me know how to get around this problem.
> 
> Do I need to install extra font packages?
> If so, how? Where from?
> 
> At the very least, I'd like to know how to remove these statements from my
> log. (I've tried throwing logback.xml and log4j.properties into my
> resources folder, setting package org.apache.pdfbox to INFO, to no avail)
> 
> The system running my extractor code is stock Ubuntu 14.04 with Azul
> openjdk 7 (see
> https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)
> 
> Thanks,
> Juan