You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Daniel Skiles <ds...@docfinity.com.INVALID> on 2023/05/24 15:28:10 UTC

PDF with mangled font rendering in some environments

All,
I'm trying to convert a PDF to an image and I'm encountering problems with
some font rendering on some Linux systems.  If anyone could provide any
ideas on how to fix this I'd appreciate it.

The PDF is too large to attach, so it's available at this link:
https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing

So far as I can tell, the attached file comes from some sort of mail
merge-style application that is injecting text into a template.  The
injected text uses a different font than the rest of the document.

On Windows systems, this works fine, but on Linux systems, PDFBox renders
the text as gibberish glyphs in a way that I've never seen before.

When I reproduce the issue with logging increased to trace, I get the
following line in the log.

15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
Using non-embedded GIDs in font Calibri

When I list the fonts in the PDF, Calibri is listed as both an embedded
*and *an Identity-H font.  Given that we have to substitute Carlito for
Calibri, this may be relevant.

In the source code
<https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241>,
a comment line suggests there's a mismatch that involves GIDs, CIDs, and
embedded vs non-embedded fonts.

Has anyone here ever seen behavior like this before?  Is this a bug?  If it
is a bug, what is the procedure to report it?

If it's not a bug, does anyone have any suggestions on what I might need to
fix in my environment?

Any input that anyone might have would be helpful.

Thank you,
Daniel

Re: PDF with mangled font rendering in some environments

Posted by Tres Finocchiaro <tr...@gmail.com>.
Hi,

Does this describe the problem?

https://github.com/openjdk/jdk/pull/3631
https://bugs.openjdk.org/browse/JDK-8265761

If so, it may be fixed by updating Java (11.0.13 I believe is where the
patch landed)


- Tres.Finocchiaro@gmail.com


On Wed, May 24, 2023 at 2:34 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Hi,
>
> The problem is the PDF itself, it references a font that isn't embedded.
> PDFBox then tries to find such a font on the local system.
>
> To confirm this, (temporarly) copy the calibri font from another system
> to the linux system. If it works, buy the calibri font. If not, delete it.
>
> Is there more log output? You mention "Given that we have to substitute
> Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use
> Carlito for Calibri?
>
> Tilman
>
> On 24.05.2023 17:28, Daniel Skiles wrote:
> > All,
> > I'm trying to convert a PDF to an image and I'm encountering problems
> with
> > some font rendering on some Linux systems.  If anyone could provide any
> > ideas on how to fix this I'd appreciate it.
> >
> > The PDF is too large to attach, so it's available at this link:
> >
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
> >
> > So far as I can tell, the attached file comes from some sort of mail
> > merge-style application that is injecting text into a template.  The
> > injected text uses a different font than the rest of the document.
> >
> > On Windows systems, this works fine, but on Linux systems, PDFBox renders
> > the text as gibberish glyphs in a way that I've never seen before.
> >
> > When I reproduce the issue with logging increased to trace, I get the
> > following line in the log.
> >
> > 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> > Using non-embedded GIDs in font Calibri
> >
> > When I list the fonts in the PDF, Calibri is listed as both an embedded
> > *and *an Identity-H font.  Given that we have to substitute Carlito for
> > Calibri, this may be relevant.
> >
> > In the source code
> > <
> https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241
> >,
> > a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> > embedded vs non-embedded fonts.
> >
> > Has anyone here ever seen behavior like this before?  Is this a bug?  If
> it
> > is a bug, what is the procedure to report it?
> >
> > If it's not a bug, does anyone have any suggestions on what I might need
> to
> > fix in my environment?
> >
> > Any input that anyone might have would be helpful.
> >
> > Thank you,
> > Daniel
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

RE: PDF with mangled font rendering in some environments

Posted by Robin Doyce Jenkins <RJ...@usiinc.com.INVALID>.
unsubscribe

-----Original Message-----
From: Tilman Hausherr <TH...@t-online.de> 
Sent: Thursday, May 25, 2023 1:59 AM
To: users@pdfbox.apache.org
Subject: Re: PDF with mangled font rendering in some environments

On 25.05.2023 01:02, Daniel Skiles wrote:
> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
> (FileSystemFontProvider.java:196)
> - Loaded Calibri-Regular from
> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf

It is using carlito instead of calibri. However the GIDs are not identical. I opened both in DTL OTMaster light 3.7 and they are not.

I tried to simulate this by making it look that I don't have calibri but I failed, it didn't use Carlito instead it used Liberation Sans and this also produced garbled fonts.

I also tried to change the code so that it avoids the "Using non-embedded GIDs in font" segment and that worked.

I think the problem is that the logic in that part of the code applies only when the found font is the exact same one that should have been embedded.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDF with mangled font rendering in some environments

Posted by Tilman Hausherr <TH...@t-online.de>.
On 30.05.2023 19:41, Daniel Skiles wrote:
> That seems like it fixes it, at least for my use case.
>
> Will this also end up in the 3.0 code line?


Yes it's in both.

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.0-SNAPSHOT/

Tilman




>
> On Sat, May 27, 2023 at 1:10 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> I hopefully fixed it, please try a snapshot
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/
>>
>> if it doesn't work, please report the log messages.
>>
>> Tilman
>>
>> On 27.05.2023 14:21, Tilman Hausherr wrote:
>>> On 25.05.2023 14:30, Daniel Skiles wrote:
>>>> Am I reading your response right that this requires a code change inside
>>>> PDFBox?  Is that something that could end up in a future release?
>>> IMHO it is a bug in our code, I created an issue in JIRA, but don't
>>> know yet how to fix it.
>>>
>>> https://issues.apache.org/jira/browse/PDFBOX-5612
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDF with mangled font rendering in some environments

Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
That seems like it fixes it, at least for my use case.

Will this also end up in the 3.0 code line?

On Sat, May 27, 2023 at 1:10 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> I hopefully fixed it, please try a snapshot
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/
>
> if it doesn't work, please report the log messages.
>
> Tilman
>
> On 27.05.2023 14:21, Tilman Hausherr wrote:
> > On 25.05.2023 14:30, Daniel Skiles wrote:
> >> Am I reading your response right that this requires a code change inside
> >> PDFBox?  Is that something that could end up in a future release?
> >
> > IMHO it is a bug in our code, I created an issue in JIRA, but don't
> > know yet how to fix it.
> >
> > https://issues.apache.org/jira/browse/PDFBOX-5612
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDF with mangled font rendering in some environments

Posted by Tilman Hausherr <TH...@t-online.de>.
I hopefully fixed it, please try a snapshot
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/

if it doesn't work, please report the log messages.

Tilman

On 27.05.2023 14:21, Tilman Hausherr wrote:
> On 25.05.2023 14:30, Daniel Skiles wrote:
>> Am I reading your response right that this requires a code change inside
>> PDFBox?  Is that something that could end up in a future release?
>
> IMHO it is a bug in our code, I created an issue in JIRA, but don't 
> know yet how to fix it.
>
> https://issues.apache.org/jira/browse/PDFBOX-5612
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDF with mangled font rendering in some environments

Posted by Tilman Hausherr <TH...@t-online.de>.
On 25.05.2023 14:30, Daniel Skiles wrote:
> Am I reading your response right that this requires a code change inside
> PDFBox?  Is that something that could end up in a future release?

IMHO it is a bug in our code, I created an issue in JIRA, but don't know 
yet how to fix it.

https://issues.apache.org/jira/browse/PDFBOX-5612



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDF with mangled font rendering in some environments

Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
Thank you for digging into it.

Am I reading your response right that this requires a code change inside
PDFBox?  Is that something that could end up in a future release?

On Thursday, May 25, 2023, Tilman Hausherr <TH...@t-online.de> wrote:

> On 25.05.2023 01:02, Daniel Skiles wrote:
>
>> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
>> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
>> (FileSystemFontProvider.java:196)
>> - Loaded Calibri-Regular from
>> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
>>
>
> It is using carlito instead of calibri. However the GIDs are not
> identical. I opened both in DTL OTMaster light 3.7 and they are not.
>
> I tried to simulate this by making it look that I don't have calibri but I
> failed, it didn't use Carlito instead it used Liberation Sans and this also
> produced garbled fonts.
>
> I also tried to change the code so that it avoids the "Using non-embedded
> GIDs in font" segment and that worked.
>
> I think the problem is that the logic in that part of the code applies
> only when the found font is the exact same one that should have been
> embedded.
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDF with mangled font rendering in some environments

Posted by Tilman Hausherr <TH...@t-online.de>.
On 25.05.2023 01:02, Daniel Skiles wrote:
> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
> (FileSystemFontProvider.java:196)
> - Loaded Calibri-Regular from
> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf

It is using carlito instead of calibri. However the GIDs are not 
identical. I opened both in DTL OTMaster light 3.7 and they are not.

I tried to simulate this by making it look that I don't have calibri but 
I failed, it didn't use Carlito instead it used Liberation Sans and this 
also produced garbled fonts.

I also tried to change the code so that it avoids the "Using 
non-embedded GIDs in font" segment and that worked.

I think the problem is that the logic in that part of the code applies 
only when the found font is the exact same one that should have been 
embedded.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDF with mangled font rendering in some environments

Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
Tilman,
Here is some additional logging that I was able to capture.  It looks like
it's trying to use Carlito, and actually succeeds for some of the text on
the page.  It seems like the text that's stored as "Identity-H" is the
problem?

2023-04-06 14:05:51,138 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,152 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded LiberationMono from
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded Calibri-Regular from
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,293 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,294 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded Calibri-Bold from
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,732 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,736 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,739 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,850 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,852 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,855 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:53,318 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] WARN [catalina-exec-7] (PDCIDFontType2.java:242)
- Using non-embedded GIDs in font Calibri


On Wed, May 24, 2023 at 2:34 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Hi,
>
> The problem is the PDF itself, it references a font that isn't embedded.
> PDFBox then tries to find such a font on the local system.
>
> To confirm this, (temporarly) copy the calibri font from another system
> to the linux system. If it works, buy the calibri font. If not, delete it.
>
> Is there more log output? You mention "Given that we have to substitute
> Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use
> Carlito for Calibri?
>
> Tilman
>
> On 24.05.2023 17:28, Daniel Skiles wrote:
> > All,
> > I'm trying to convert a PDF to an image and I'm encountering problems
> with
> > some font rendering on some Linux systems.  If anyone could provide any
> > ideas on how to fix this I'd appreciate it.
> >
> > The PDF is too large to attach, so it's available at this link:
> >
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
> >
> > So far as I can tell, the attached file comes from some sort of mail
> > merge-style application that is injecting text into a template.  The
> > injected text uses a different font than the rest of the document.
> >
> > On Windows systems, this works fine, but on Linux systems, PDFBox renders
> > the text as gibberish glyphs in a way that I've never seen before.
> >
> > When I reproduce the issue with logging increased to trace, I get the
> > following line in the log.
> >
> > 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> > Using non-embedded GIDs in font Calibri
> >
> > When I list the fonts in the PDF, Calibri is listed as both an embedded
> > *and *an Identity-H font.  Given that we have to substitute Carlito for
> > Calibri, this may be relevant.
> >
> > In the source code
> > <
> https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241
> >,
> > a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> > embedded vs non-embedded fonts.
> >
> > Has anyone here ever seen behavior like this before?  Is this a bug?  If
> it
> > is a bug, what is the procedure to report it?
> >
> > If it's not a bug, does anyone have any suggestions on what I might need
> to
> > fix in my environment?
> >
> > Any input that anyone might have would be helpful.
> >
> > Thank you,
> > Daniel
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDF with mangled font rendering in some environments

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

The problem is the PDF itself, it references a font that isn't embedded. 
PDFBox then tries to find such a font on the local system.

To confirm this, (temporarly) copy the calibri font from another system 
to the linux system. If it works, buy the calibri font. If not, delete it.

Is there more log output? You mention "Given that we have to substitute 
Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use 
Carlito for Calibri?

Tilman

On 24.05.2023 17:28, Daniel Skiles wrote:
> All,
> I'm trying to convert a PDF to an image and I'm encountering problems with
> some font rendering on some Linux systems.  If anyone could provide any
> ideas on how to fix this I'd appreciate it.
>
> The PDF is too large to attach, so it's available at this link:
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
>
> So far as I can tell, the attached file comes from some sort of mail
> merge-style application that is injecting text into a template.  The
> injected text uses a different font than the rest of the document.
>
> On Windows systems, this works fine, but on Linux systems, PDFBox renders
> the text as gibberish glyphs in a way that I've never seen before.
>
> When I reproduce the issue with logging increased to trace, I get the
> following line in the log.
>
> 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> Using non-embedded GIDs in font Calibri
>
> When I list the fonts in the PDF, Calibri is listed as both an embedded
> *and *an Identity-H font.  Given that we have to substitute Carlito for
> Calibri, this may be relevant.
>
> In the source code
> <https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241>,
> a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> embedded vs non-embedded fonts.
>
> Has anyone here ever seen behavior like this before?  Is this a bug?  If it
> is a bug, what is the procedure to report it?
>
> If it's not a bug, does anyone have any suggestions on what I might need to
> fix in my environment?
>
> Any input that anyone might have would be helpful.
>
> Thank you,
> Daniel
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org