You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2017/03/01 22:55:05 UTC

Problem with unsupported characters in a font

All,

I'm getting an error when preparing text to write to a PDF document:

java.lang.IllegalArgumentException: U+2265 ('greaterequal') is not
available in this font's encoding: WinAnsiEncoding
        at
org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:345)
        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
        at
org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:315)


It's obvious that the ≥ symbol isn't available in the font we are using
(probably the default set of fonts... we aren't doing anything fancy at
this point).

Is there a good way to "sanitize" a string for the current font?

I can just start building a character-by-character replacement table,
but that's a little too whack-a-mole for my tastes. I'd prefer to do
something like ask the API what characters aren't okay, replace them
with something that IS okay (like "?") and log a warning. Then we can
collect the warnings and map the characters in a nicer way later.

Is there any way to do that kind of thing with PDFBox?

Thanks,
-chris


Re: Problem with unsupported characters in a font

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 01.03.2017 um 23:55 schrieb Christopher Schultz:
> All,
>
> I'm getting an error when preparing text to write to a PDF document:
>
> java.lang.IllegalArgumentException: U+2265 ('greaterequal') is not
> available in this font's encoding: WinAnsiEncoding
>          at
> org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:345)
>          at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
>          at
> org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:315)
>
>
> It's obvious that the \u2265 symbol isn't available in the font we are using
> (probably the default set of fonts... we aren't doing anything fancy at
> this point).
>
> Is there a good way to "sanitize" a string for the current font?

You could call PDFont.encode() for each character and catch 
IllegalArgumentException, and replace your string with whatever you like.

Btw maybe the symbol is available - you're using WinAnsiEncoding. If you 
use font files (call PDType0Font.load()), then you can use much more glyphs.

Tilman

>
> I can just start building a character-by-character replacement table,
> but that's a little too whack-a-mole for my tastes. I'd prefer to do
> something like ask the API what characters aren't okay, replace them
> with something that IS okay (like "?") and log a warning. Then we can
> collect the warnings and map the characters in a nicer way later.
>
> Is there any way to do that kind of thing with PDFBox?
>
> Thanks,
> -chris
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org