You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Daniel Skiles <ds...@docfinity.com.INVALID> on 2023/05/24 15:28:10 UTC
PDF with mangled font rendering in some environments
All,
I'm trying to convert a PDF to an image and I'm encountering problems with
some font rendering on some Linux systems. If anyone could provide any
ideas on how to fix this I'd appreciate it.
The PDF is too large to attach, so it's available at this link:
https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
So far as I can tell, the attached file comes from some sort of mail
merge-style application that is injecting text into a template. The
injected text uses a different font than the rest of the document.
On Windows systems, this works fine, but on Linux systems, PDFBox renders
the text as gibberish glyphs in a way that I've never seen before.
When I reproduce the issue with logging increased to trace, I get the
following line in the log.
15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
Using non-embedded GIDs in font Calibri
When I list the fonts in the PDF, Calibri is listed as both an embedded
*and *an Identity-H font. Given that we have to substitute Carlito for
Calibri, this may be relevant.
In the source code
<https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241>,
a comment line suggests there's a mismatch that involves GIDs, CIDs, and
embedded vs non-embedded fonts.
Has anyone here ever seen behavior like this before? Is this a bug? If it
is a bug, what is the procedure to report it?
If it's not a bug, does anyone have any suggestions on what I might need to
fix in my environment?
Any input that anyone might have would be helpful.
Thank you,
Daniel
Re: PDF with mangled font rendering in some environments
Posted by Tres Finocchiaro <tr...@gmail.com>.
Hi,
Does this describe the problem?
https://github.com/openjdk/jdk/pull/3631
https://bugs.openjdk.org/browse/JDK-8265761
If so, it may be fixed by updating Java (11.0.13 I believe is where the
patch landed)
- Tres.Finocchiaro@gmail.com
On Wed, May 24, 2023 at 2:34 PM Tilman Hausherr <TH...@t-online.de>
wrote:
> Hi,
>
> The problem is the PDF itself, it references a font that isn't embedded.
> PDFBox then tries to find such a font on the local system.
>
> To confirm this, (temporarly) copy the calibri font from another system
> to the linux system. If it works, buy the calibri font. If not, delete it.
>
> Is there more log output? You mention "Given that we have to substitute
> Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use
> Carlito for Calibri?
>
> Tilman
>
> On 24.05.2023 17:28, Daniel Skiles wrote:
> > All,
> > I'm trying to convert a PDF to an image and I'm encountering problems
> with
> > some font rendering on some Linux systems. If anyone could provide any
> > ideas on how to fix this I'd appreciate it.
> >
> > The PDF is too large to attach, so it's available at this link:
> >
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
> >
> > So far as I can tell, the attached file comes from some sort of mail
> > merge-style application that is injecting text into a template. The
> > injected text uses a different font than the rest of the document.
> >
> > On Windows systems, this works fine, but on Linux systems, PDFBox renders
> > the text as gibberish glyphs in a way that I've never seen before.
> >
> > When I reproduce the issue with logging increased to trace, I get the
> > following line in the log.
> >
> > 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> > Using non-embedded GIDs in font Calibri
> >
> > When I list the fonts in the PDF, Calibri is listed as both an embedded
> > *and *an Identity-H font. Given that we have to substitute Carlito for
> > Calibri, this may be relevant.
> >
> > In the source code
> > <
> https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241
> >,
> > a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> > embedded vs non-embedded fonts.
> >
> > Has anyone here ever seen behavior like this before? Is this a bug? If
> it
> > is a bug, what is the procedure to report it?
> >
> > If it's not a bug, does anyone have any suggestions on what I might need
> to
> > fix in my environment?
> >
> > Any input that anyone might have would be helpful.
> >
> > Thank you,
> > Daniel
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
RE: PDF with mangled font rendering in some environments
Posted by Robin Doyce Jenkins <RJ...@usiinc.com.INVALID>.
unsubscribe
-----Original Message-----
From: Tilman Hausherr <TH...@t-online.de>
Sent: Thursday, May 25, 2023 1:59 AM
To: users@pdfbox.apache.org
Subject: Re: PDF with mangled font rendering in some environments
On 25.05.2023 01:02, Daniel Skiles wrote:
> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
> (FileSystemFontProvider.java:196)
> - Loaded Calibri-Regular from
> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
It is using carlito instead of calibri. However the GIDs are not identical. I opened both in DTL OTMaster light 3.7 and they are not.
I tried to simulate this by making it look that I don't have calibri but I failed, it didn't use Carlito instead it used Liberation Sans and this also produced garbled fonts.
I also tried to change the code so that it avoids the "Using non-embedded GIDs in font" segment and that worked.
I think the problem is that the logic in that part of the code applies only when the found font is the exact same one that should have been embedded.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDF with mangled font rendering in some environments
Posted by Tilman Hausherr <TH...@t-online.de>.
On 30.05.2023 19:41, Daniel Skiles wrote:
> That seems like it fixes it, at least for my use case.
>
> Will this also end up in the 3.0 code line?
Yes it's in both.
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.0-SNAPSHOT/
Tilman
>
> On Sat, May 27, 2023 at 1:10 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> I hopefully fixed it, please try a snapshot
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/
>>
>> if it doesn't work, please report the log messages.
>>
>> Tilman
>>
>> On 27.05.2023 14:21, Tilman Hausherr wrote:
>>> On 25.05.2023 14:30, Daniel Skiles wrote:
>>>> Am I reading your response right that this requires a code change inside
>>>> PDFBox? Is that something that could end up in a future release?
>>> IMHO it is a bug in our code, I created an issue in JIRA, but don't
>>> know yet how to fix it.
>>>
>>> https://issues.apache.org/jira/browse/PDFBOX-5612
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDF with mangled font rendering in some environments
Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
That seems like it fixes it, at least for my use case.
Will this also end up in the 3.0 code line?
On Sat, May 27, 2023 at 1:10 PM Tilman Hausherr <TH...@t-online.de>
wrote:
> I hopefully fixed it, please try a snapshot
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/
>
> if it doesn't work, please report the log messages.
>
> Tilman
>
> On 27.05.2023 14:21, Tilman Hausherr wrote:
> > On 25.05.2023 14:30, Daniel Skiles wrote:
> >> Am I reading your response right that this requires a code change inside
> >> PDFBox? Is that something that could end up in a future release?
> >
> > IMHO it is a bug in our code, I created an issue in JIRA, but don't
> > know yet how to fix it.
> >
> > https://issues.apache.org/jira/browse/PDFBOX-5612
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: PDF with mangled font rendering in some environments
Posted by Tilman Hausherr <TH...@t-online.de>.
I hopefully fixed it, please try a snapshot
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.29-SNAPSHOT/
if it doesn't work, please report the log messages.
Tilman
On 27.05.2023 14:21, Tilman Hausherr wrote:
> On 25.05.2023 14:30, Daniel Skiles wrote:
>> Am I reading your response right that this requires a code change inside
>> PDFBox? Is that something that could end up in a future release?
>
> IMHO it is a bug in our code, I created an issue in JIRA, but don't
> know yet how to fix it.
>
> https://issues.apache.org/jira/browse/PDFBOX-5612
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDF with mangled font rendering in some environments
Posted by Tilman Hausherr <TH...@t-online.de>.
On 25.05.2023 14:30, Daniel Skiles wrote:
> Am I reading your response right that this requires a code change inside
> PDFBox? Is that something that could end up in a future release?
IMHO it is a bug in our code, I created an issue in JIRA, but don't know
yet how to fix it.
https://issues.apache.org/jira/browse/PDFBOX-5612
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDF with mangled font rendering in some environments
Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
Thank you for digging into it.
Am I reading your response right that this requires a code change inside
PDFBox? Is that something that could end up in a future release?
On Thursday, May 25, 2023, Tilman Hausherr <TH...@t-online.de> wrote:
> On 25.05.2023 01:02, Daniel Skiles wrote:
>
>> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
>> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
>> (FileSystemFontProvider.java:196)
>> - Loaded Calibri-Regular from
>> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
>>
>
> It is using carlito instead of calibri. However the GIDs are not
> identical. I opened both in DTL OTMaster light 3.7 and they are not.
>
> I tried to simulate this by making it look that I don't have calibri but I
> failed, it didn't use Carlito instead it used Liberation Sans and this also
> produced garbled fonts.
>
> I also tried to change the code so that it avoids the "Using non-embedded
> GIDs in font" segment and that worked.
>
> I think the problem is that the logic in that part of the code applies
> only when the found font is the exact same one that should have been
> embedded.
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: PDF with mangled font rendering in some environments
Posted by Tilman Hausherr <TH...@t-online.de>.
On 25.05.2023 01:02, Daniel Skiles wrote:
> 2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
> [172.18.0.1] [service={}] DEBUG [catalina-exec-1]
> (FileSystemFontProvider.java:196)
> - Loaded Calibri-Regular from
> /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
It is using carlito instead of calibri. However the GIDs are not
identical. I opened both in DTL OTMaster light 3.7 and they are not.
I tried to simulate this by making it look that I don't have calibri but
I failed, it didn't use Carlito instead it used Liberation Sans and this
also produced garbled fonts.
I also tried to change the code so that it avoids the "Using
non-embedded GIDs in font" segment and that worked.
I think the problem is that the logic in that part of the code applies
only when the found font is the exact same one that should have been
embedded.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PDF with mangled font rendering in some environments
Posted by Daniel Skiles <ds...@docfinity.com.INVALID>.
Tilman,
Here is some additional logging that I was able to capture. It looks like
it's trying to use Carlito, and actually succeeds for some of the text on
the page. It seems like the text that's stored as "Identity-H" is the
problem?
2023-04-06 14:05:51,138 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,152 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded LiberationMono from
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,245 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded Calibri-Regular from
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,293 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,294 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-1]
(FileSystemFontProvider.java:196)
- Loaded Calibri-Bold from
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,732 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,736 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,739 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-10] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:51,850 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','LiberationMono') returns LiberationMono (TTF, mac: 0x0,
os/2: 0x805, cid: null)
/usr/share/fonts/liberation-mono/LiberationMono-Regular.ttf
2023-04-06 14:05:51,852 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','Calibri-Regular') returns Calibri-Regular (TTF, mac: 0x0,
os/2: 0x0, cid: null)
/usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Regular.ttf
2023-04-06 14:05:51,855 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] DEBUG [catalina-exec-7] (FontMapperImpl.java:469)
- getFont('TTF','CalibriBold') returns Calibri-Bold (TTF, mac: 0x1, os/2:
0x0, cid: null) /usr/share/fonts/fonts_compat/ms/Calibri/Carlito-Bold.ttf
2023-04-06 14:05:53,318 [admin] [53088500EB38627B70BCA6FA9037CD0C]
[172.18.0.1] [service={}] WARN [catalina-exec-7] (PDCIDFontType2.java:242)
- Using non-embedded GIDs in font Calibri
On Wed, May 24, 2023 at 2:34 PM Tilman Hausherr <TH...@t-online.de>
wrote:
> Hi,
>
> The problem is the PDF itself, it references a font that isn't embedded.
> PDFBox then tries to find such a font on the local system.
>
> To confirm this, (temporarly) copy the calibri font from another system
> to the linux system. If it works, buy the calibri font. If not, delete it.
>
> Is there more log output? You mention "Given that we have to substitute
> Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use
> Carlito for Calibri?
>
> Tilman
>
> On 24.05.2023 17:28, Daniel Skiles wrote:
> > All,
> > I'm trying to convert a PDF to an image and I'm encountering problems
> with
> > some font rendering on some Linux systems. If anyone could provide any
> > ideas on how to fix this I'd appreciate it.
> >
> > The PDF is too large to attach, so it's available at this link:
> >
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
> >
> > So far as I can tell, the attached file comes from some sort of mail
> > merge-style application that is injecting text into a template. The
> > injected text uses a different font than the rest of the document.
> >
> > On Windows systems, this works fine, but on Linux systems, PDFBox renders
> > the text as gibberish glyphs in a way that I've never seen before.
> >
> > When I reproduce the issue with logging increased to trace, I get the
> > following line in the log.
> >
> > 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> > Using non-embedded GIDs in font Calibri
> >
> > When I list the fonts in the PDF, Calibri is listed as both an embedded
> > *and *an Identity-H font. Given that we have to substitute Carlito for
> > Calibri, this may be relevant.
> >
> > In the source code
> > <
> https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241
> >,
> > a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> > embedded vs non-embedded fonts.
> >
> > Has anyone here ever seen behavior like this before? Is this a bug? If
> it
> > is a bug, what is the procedure to report it?
> >
> > If it's not a bug, does anyone have any suggestions on what I might need
> to
> > fix in my environment?
> >
> > Any input that anyone might have would be helpful.
> >
> > Thank you,
> > Daniel
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: PDF with mangled font rendering in some environments
Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,
The problem is the PDF itself, it references a font that isn't embedded.
PDFBox then tries to find such a font on the local system.
To confirm this, (temporarly) copy the calibri font from another system
to the linux system. If it works, buy the calibri font. If not, delete it.
Is there more log output? You mention "Given that we have to substitute
Carlito for Calibri, this may be relevant." Does PDFBoox attempt to use
Carlito for Calibri?
Tilman
On 24.05.2023 17:28, Daniel Skiles wrote:
> All,
> I'm trying to convert a PDF to an image and I'm encountering problems with
> some font rendering on some Linux systems. If anyone could provide any
> ideas on how to fix this I'd appreciate it.
>
> The PDF is too large to attach, so it's available at this link:
> https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing
>
> So far as I can tell, the attached file comes from some sort of mail
> merge-style application that is injecting text into a template. The
> injected text uses a different font than the rest of the document.
>
> On Windows systems, this works fine, but on Linux systems, PDFBox renders
> the text as gibberish glyphs in a way that I've never seen before.
>
> When I reproduce the issue with logging increased to trace, I get the
> following line in the log.
>
> 15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
> Using non-embedded GIDs in font Calibri
>
> When I list the fonts in the PDF, Calibri is listed as both an embedded
> *and *an Identity-H font. Given that we have to substitute Carlito for
> Calibri, this may be relevant.
>
> In the source code
> <https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241>,
> a comment line suggests there's a mismatch that involves GIDs, CIDs, and
> embedded vs non-embedded fonts.
>
> Has anyone here ever seen behavior like this before? Is this a bug? If it
> is a bug, what is the procedure to report it?
>
> If it's not a bug, does anyone have any suggestions on what I might need to
> fix in my environment?
>
> Any input that anyone might have would be helpful.
>
> Thank you,
> Daniel
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org