You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Kjetil Ødegaard <kj...@dcompany.no.INVALID> on 2023/12/04 15:21:29 UTC

Font operation takes a long time with 3.0.1

Hi,

I tried to upgrade an app to PDFBox 3.0.1 and I see a performance issue.

It only affects the first PDF operation (after that it's quite fast), but
it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).

Profiling reveals that this Kotlin code triggers the delay:

    val font = PDType1Font(Standard14Fonts.FontName.COURIER)

The thread dump shows that almost all time is spent in this method:

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash

I assume that this is related to PDFBOX-5684.

Is this possible to work around? Or is it possible to fix?

BR Kjetil

Re: Font operation takes a long time with 3.0.1

Posted by Kjetil Ødegaard <kj...@dcompany.no.INVALID>.
Tested the new snapshot. Performance looks good.

Cache file excerpt:

➜  ~ grep -i NotoSansKannada .pdfbox.cache
*skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc|b930924c|1700331239000

BR Kjetil

tir. 5. des. 2023 kl. 15:10 skrev Tilman Hausherr <TH...@t-online.de>:

> Thanks for the feedback. It turns out that there's another error
> (checksum was empty because MessageDigest doesn't support CRC32), which
> has been fixed now, please test again (delete the file first). The
> second-to-last field should now not be empty.
>
> It also teaches an important lesson: a "// never happens" segment should
> have an output.
>
> Tilman
>
> On 05.12.2023 11:34, Kjetil Ødegaard wrote:
> > Nice! Tested it now and I can confirm that it fixes the issue. I see good
> > performance even from the first operation.
> >
> > Checked the cache file and there is a line for this font there now:
> >
> > ➜  ~ grep -i NotoSansKannada .pdfbox.cache
> >
> *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000
> >
> > Thanks for the quick response, great work!
> >
> > BR Kjetil
> >
> > tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <THausherr@t-online.de
> >:
> >
> >> Thanks, new snapshot build here:
> >>
> >>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
> >>
> >>
> >> Ticket:
> >> https://issues.apache.org/jira/browse/PDFBOX-5727
> >>
> >> Tilman
> >>
> >> On 05.12.2023 08:41, Kjetil Ødegaard wrote:
> >>> To clarify, this stack trace is not printed anywhere. I got it from
> >>> stepping into the code and invoking printStackTrace() on the exception
> to
> >>> get the whole stack. See complete stack trace below.
> >>>
> >>> I agree with your theory, it matches what I'm seeing. These fonts are
> >> never
> >>> added to the cache file, so the cache file is always rebuilt.
> >>>
> >>> I double checked the cache file again and there is no trace of these
> two
> >>> fonts, but lots of entries for other fonts (of different weights). I
> see
> >>> from the timestamp on the file that it is rebuilt on every run.
> >>>
> >>> BR Kjetil
> >>>
> >>> java.io.EOFException
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
> >>> at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
> >>> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
> >>> at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
> >>> at
> >> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
> >>> tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <
> THausherr@t-online.de
> >>> :
> >>>
> >>>> Please do also post the full (for pdfbox / fontbox) stack trace. I
> have
> >>>> a theory why it happens, which is that addTrueTypeCollection() does
> not
> >>>> add the font as "*skipexception*" to the cache file because it's not
> >>>> done in the exception handler.
> >>>>
> >>>> Tilman
> >>>>
> >>>> On 04.12.2023 21:17, Tilman Hausherr wrote:
> >>>>> Does the stack trace appear at every start? If yes then it's a bug.
> >>>>> The intent of the current code is that bad fonts aren't retried. The
> >>>>> font cache file should contain a line with "*skipexception*" for that
> >>>>> font. Can you look at it for the two font files?
> >>>>>
> >>>>> I could change SHA512 to CRC32. It has the advantage that it won't
> >>>>> trigger people who heard about MD5 😂
> >>>>>
> >>>>> I made a test and CRC32 is 20% faster.
> >>>>>
> >>>>> Tilman
> >>>>>
> >>>>> On 04.12.2023 18:48, Gili Tzabari wrote:
> >>>>>> I think the commit contains a typo:
> >>>>>>
> >>>>>>
> >>>>>> 872
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872
> >>>>>>       private static String computeHash(byte[] ba)
> >>>>>> 873
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873
> >>>>>>       {
> >>>>>> 874
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874
> >>>>>>       MessageDigest md;
> >>>>>> 875
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875
> >>>>>>       try
> >>>>>> 876
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876
> >>>>>>       {
> >>>>>> 877
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877
> >>>>>>       md = MessageDigest.getInstance("SHA512");
> >>>>>> 878
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878
> >>>>>>       byte[] md5 = md.digest(ba);
> >>>>>> 879
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879
> >>>>>>       return Hex.getString(md5);
> >>>>>> 880
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880
> >>>>>>       }
> >>>>>> 881
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881
> >>>>>>       catch (NoSuchAlgorithmException ex)
> >>>>>> 882
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882
> >>>>>>       {
> >>>>>> 883
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883
> >>>>>>       // never happens
> >>>>>> 884
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884
> >>>>>>       return "";
> >>>>>> 885
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885
> >>>>>>       }
> >>>>>> 886
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886
> >>>>>>       }
> >>>>>>
> >>>>>> You shouldn't need to use SHA512 to detect changes by a
> non-malicious
> >>>>>> actor. MD5 should be plenty, and even CRC32 would be enough. I
> >>>>>> suggest downgrading the hash complexity.
> >>>>>>
> >>>>>> Gili
> >>>>>>
> >>>>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
> >>>>>>> issue.
> >>>>>>>
> >>>>>>> It only affects the first PDF operation (after that it's quite
> >>>>>>> fast), but
> >>>>>>> it's a bit annoying since it takes about 20 seconds (on my M1
> >> Macboox).
> >>>>>>> Profiling reveals that this Kotlin code triggers the delay:
> >>>>>>>
> >>>>>>>        val font = PDType1Font(Standard14Fonts.FontName.COURIER)
> >>>>>>>
> >>>>>>> The thread dump shows that almost all time is spent in this method:
> >>>>>>>
> >>>>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
> >>>>>>>
> >>>>>>> I assume that this is related to PDFBOX-5684.
> >>>>>>>
> >>>>>>> Is this possible to work around? Or is it possible to fix?
> >>>>>>>
> >>>>>>> BR Kjetil
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Font operation takes a long time with 3.0.1

Posted by Tilman Hausherr <TH...@t-online.de>.
Thanks for the feedback. It turns out that there's another error 
(checksum was empty because MessageDigest doesn't support CRC32), which 
has been fixed now, please test again (delete the file first). The 
second-to-last field should now not be empty.

It also teaches an important lesson: a "// never happens" segment should 
have an output.

Tilman

On 05.12.2023 11:34, Kjetil Ødegaard wrote:
> Nice! Tested it now and I can confirm that it fixes the issue. I see good
> performance even from the first operation.
>
> Checked the cache file and there is a line for this font there now:
>
> ➜  ~ grep -i NotoSansKannada .pdfbox.cache
> *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000
>
> Thanks for the quick response, great work!
>
> BR Kjetil
>
> tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <TH...@t-online.de>:
>
>> Thanks, new snapshot build here:
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
>>
>>
>> Ticket:
>> https://issues.apache.org/jira/browse/PDFBOX-5727
>>
>> Tilman
>>
>> On 05.12.2023 08:41, Kjetil Ødegaard wrote:
>>> To clarify, this stack trace is not printed anywhere. I got it from
>>> stepping into the code and invoking printStackTrace() on the exception to
>>> get the whole stack. See complete stack trace below.
>>>
>>> I agree with your theory, it matches what I'm seeing. These fonts are
>> never
>>> added to the cache file, so the cache file is always rebuilt.
>>>
>>> I double checked the cache file again and there is no trace of these two
>>> fonts, but lots of entries for other fonts (of different weights). I see
>>> from the timestamp on the file that it is rebuilt on every run.
>>>
>>> BR Kjetil
>>>
>>> java.io.EOFException
>>> at
>>>
>> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
>>> at
>>>
>> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
>>> at
>>>
>> org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
>>> at
>>>
>> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
>>> at
>>>
>> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
>>> at
>>>
>> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
>>> at
>>>
>> org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
>>> at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
>>> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
>>> at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
>>> at
>>>
>> org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
>>> at
>>>
>> org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
>>> at
>>>
>> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
>>> at
>> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
>>> tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <THausherr@t-online.de
>>> :
>>>
>>>> Please do also post the full (for pdfbox / fontbox) stack trace. I have
>>>> a theory why it happens, which is that addTrueTypeCollection() does not
>>>> add the font as "*skipexception*" to the cache file because it's not
>>>> done in the exception handler.
>>>>
>>>> Tilman
>>>>
>>>> On 04.12.2023 21:17, Tilman Hausherr wrote:
>>>>> Does the stack trace appear at every start? If yes then it's a bug.
>>>>> The intent of the current code is that bad fonts aren't retried. The
>>>>> font cache file should contain a line with "*skipexception*" for that
>>>>> font. Can you look at it for the two font files?
>>>>>
>>>>> I could change SHA512 to CRC32. It has the advantage that it won't
>>>>> trigger people who heard about MD5 😂
>>>>>
>>>>> I made a test and CRC32 is 20% faster.
>>>>>
>>>>> Tilman
>>>>>
>>>>> On 04.12.2023 18:48, Gili Tzabari wrote:
>>>>>> I think the commit contains a typo:
>>>>>>
>>>>>>
>>>>>> 872
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872
>>>>>>       private static String computeHash(byte[] ba)
>>>>>> 873
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873
>>>>>>       {
>>>>>> 874
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874
>>>>>>       MessageDigest md;
>>>>>> 875
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875
>>>>>>       try
>>>>>> 876
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876
>>>>>>       {
>>>>>> 877
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877
>>>>>>       md = MessageDigest.getInstance("SHA512");
>>>>>> 878
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878
>>>>>>       byte[] md5 = md.digest(ba);
>>>>>> 879
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879
>>>>>>       return Hex.getString(md5);
>>>>>> 880
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880
>>>>>>       }
>>>>>> 881
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881
>>>>>>       catch (NoSuchAlgorithmException ex)
>>>>>> 882
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882
>>>>>>       {
>>>>>> 883
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883
>>>>>>       // never happens
>>>>>> 884
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884
>>>>>>       return "";
>>>>>> 885
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885
>>>>>>       }
>>>>>> 886
>>>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886
>>>>>>       }
>>>>>>
>>>>>> You shouldn't need to use SHA512 to detect changes by a non-malicious
>>>>>> actor. MD5 should be plenty, and even CRC32 would be enough. I
>>>>>> suggest downgrading the hash complexity.
>>>>>>
>>>>>> Gili
>>>>>>
>>>>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
>>>>>>> issue.
>>>>>>>
>>>>>>> It only affects the first PDF operation (after that it's quite
>>>>>>> fast), but
>>>>>>> it's a bit annoying since it takes about 20 seconds (on my M1
>> Macboox).
>>>>>>> Profiling reveals that this Kotlin code triggers the delay:
>>>>>>>
>>>>>>>        val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>>>>>>>
>>>>>>> The thread dump shows that almost all time is spent in this method:
>>>>>>>
>>>>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>>>>>>>
>>>>>>> I assume that this is related to PDFBOX-5684.
>>>>>>>
>>>>>>> Is this possible to work around? Or is it possible to fix?
>>>>>>>
>>>>>>> BR Kjetil
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Font operation takes a long time with 3.0.1

Posted by Kjetil Ødegaard <kj...@dcompany.no.INVALID>.
Nice! Tested it now and I can confirm that it fixes the issue. I see good
performance even from the first operation.

Checked the cache file and there is a line for this font there now:

➜  ~ grep -i NotoSansKannada .pdfbox.cache
*skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000

Thanks for the quick response, great work!

BR Kjetil

tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <TH...@t-online.de>:

> Thanks, new snapshot build here:
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
>
>
> Ticket:
> https://issues.apache.org/jira/browse/PDFBOX-5727
>
> Tilman
>
> On 05.12.2023 08:41, Kjetil Ødegaard wrote:
> > To clarify, this stack trace is not printed anywhere. I got it from
> > stepping into the code and invoking printStackTrace() on the exception to
> > get the whole stack. See complete stack trace below.
> >
> > I agree with your theory, it matches what I'm seeing. These fonts are
> never
> > added to the cache file, so the cache file is always rebuilt.
> >
> > I double checked the cache file again and there is no trace of these two
> > fonts, but lots of entries for other fonts (of different weights). I see
> > from the timestamp on the file that it is rebuilt on every run.
> >
> > BR Kjetil
> >
> > java.io.EOFException
> > at
> >
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
> > at
> >
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
> > at
> >
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
> > at
> >
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
> > at
> >
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
> > at
> >
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
> > at
> >
> org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
> > at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
> > at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
> > at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
> > at
> >
> org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
> > at
> >
> org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
> > at
> >
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
> > at
> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
> >
> > tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <THausherr@t-online.de
> >:
> >
> >> Please do also post the full (for pdfbox / fontbox) stack trace. I have
> >> a theory why it happens, which is that addTrueTypeCollection() does not
> >> add the font as "*skipexception*" to the cache file because it's not
> >> done in the exception handler.
> >>
> >> Tilman
> >>
> >> On 04.12.2023 21:17, Tilman Hausherr wrote:
> >>> Does the stack trace appear at every start? If yes then it's a bug.
> >>> The intent of the current code is that bad fonts aren't retried. The
> >>> font cache file should contain a line with "*skipexception*" for that
> >>> font. Can you look at it for the two font files?
> >>>
> >>> I could change SHA512 to CRC32. It has the advantage that it won't
> >>> trigger people who heard about MD5 😂
> >>>
> >>> I made a test and CRC32 is 20% faster.
> >>>
> >>> Tilman
> >>>
> >>> On 04.12.2023 18:48, Gili Tzabari wrote:
> >>>> I think the commit contains a typo:
> >>>>
> >>>>
> >>>> 872
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872
> >
> >>
> >>>>      private static String computeHash(byte[] ba)
> >>>> 873
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873
> >
> >>
> >>>>      {
> >>>> 874
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874
> >
> >>
> >>>>      MessageDigest md;
> >>>> 875
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875
> >
> >>
> >>>>      try
> >>>> 876
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876
> >
> >>
> >>>>      {
> >>>> 877
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877
> >
> >>
> >>>>      md = MessageDigest.getInstance("SHA512");
> >>>> 878
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878
> >
> >>
> >>>>      byte[] md5 = md.digest(ba);
> >>>> 879
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879
> >
> >>
> >>>>      return Hex.getString(md5);
> >>>> 880
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880
> >
> >>
> >>>>      }
> >>>> 881
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881
> >
> >>
> >>>>      catch (NoSuchAlgorithmException ex)
> >>>> 882
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882
> >
> >>
> >>>>      {
> >>>> 883
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883
> >
> >>
> >>>>      // never happens
> >>>> 884
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884
> >
> >>
> >>>>      return "";
> >>>> 885
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885
> >
> >>
> >>>>      }
> >>>> 886
> >>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886
> >
> >>
> >>>>      }
> >>>>
> >>>> You shouldn't need to use SHA512 to detect changes by a non-malicious
> >>>> actor. MD5 should be plenty, and even CRC32 would be enough. I
> >>>> suggest downgrading the hash complexity.
> >>>>
> >>>> Gili
> >>>>
> >>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
> >>>>> issue.
> >>>>>
> >>>>> It only affects the first PDF operation (after that it's quite
> >>>>> fast), but
> >>>>> it's a bit annoying since it takes about 20 seconds (on my M1
> Macboox).
> >>>>>
> >>>>> Profiling reveals that this Kotlin code triggers the delay:
> >>>>>
> >>>>>       val font = PDType1Font(Standard14Fonts.FontName.COURIER)
> >>>>>
> >>>>> The thread dump shows that almost all time is spent in this method:
> >>>>>
> >>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
> >>>>>
> >>>>> I assume that this is related to PDFBOX-5684.
> >>>>>
> >>>>> Is this possible to work around? Or is it possible to fix?
> >>>>>
> >>>>> BR Kjetil
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Font operation takes a long time with 3.0.1

Posted by Tilman Hausherr <TH...@t-online.de>.
Thanks, new snapshot build here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ 


Ticket:
https://issues.apache.org/jira/browse/PDFBOX-5727

Tilman

On 05.12.2023 08:41, Kjetil Ødegaard wrote:
> To clarify, this stack trace is not printed anywhere. I got it from
> stepping into the code and invoking printStackTrace() on the exception to
> get the whole stack. See complete stack trace below.
>
> I agree with your theory, it matches what I'm seeing. These fonts are never
> added to the cache file, so the cache file is always rebuilt.
>
> I double checked the cache file again and there is no trace of these two
> fonts, but lots of entries for other fonts (of different weights). I see
> from the timestamp on the file that it is rebuilt on every run.
>
> BR Kjetil
>
> java.io.EOFException
> at
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
> at
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
> at
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
> at
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
> at
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
> at
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
> at
> org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
> at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
> at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
> at
> org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
> at
> org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
> at
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
> at
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
> at
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
> at
> org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
> at
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
> at
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
> at
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
> at
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
> at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
>
> tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <TH...@t-online.de>:
>
>> Please do also post the full (for pdfbox / fontbox) stack trace. I have
>> a theory why it happens, which is that addTrueTypeCollection() does not
>> add the font as "*skipexception*" to the cache file because it's not
>> done in the exception handler.
>>
>> Tilman
>>
>> On 04.12.2023 21:17, Tilman Hausherr wrote:
>>> Does the stack trace appear at every start? If yes then it's a bug.
>>> The intent of the current code is that bad fonts aren't retried. The
>>> font cache file should contain a line with "*skipexception*" for that
>>> font. Can you look at it for the two font files?
>>>
>>> I could change SHA512 to CRC32. It has the advantage that it won't
>>> trigger people who heard about MD5 😂
>>>
>>> I made a test and CRC32 is 20% faster.
>>>
>>> Tilman
>>>
>>> On 04.12.2023 18:48, Gili Tzabari wrote:
>>>> I think the commit contains a typo:
>>>>
>>>>
>>>> 872
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872>
>>
>>>>      private static String computeHash(byte[] ba)
>>>> 873
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873>
>>
>>>>      {
>>>> 874
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874>
>>
>>>>      MessageDigest md;
>>>> 875
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875>
>>
>>>>      try
>>>> 876
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876>
>>
>>>>      {
>>>> 877
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877>
>>
>>>>      md = MessageDigest.getInstance("SHA512");
>>>> 878
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878>
>>
>>>>      byte[] md5 = md.digest(ba);
>>>> 879
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879>
>>
>>>>      return Hex.getString(md5);
>>>> 880
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880>
>>
>>>>      }
>>>> 881
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881>
>>
>>>>      catch (NoSuchAlgorithmException ex)
>>>> 882
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882>
>>
>>>>      {
>>>> 883
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883>
>>
>>>>      // never happens
>>>> 884
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884>
>>
>>>>      return "";
>>>> 885
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885>
>>
>>>>      }
>>>> 886
>>>> <
>> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886>
>>
>>>>      }
>>>>
>>>> You shouldn't need to use SHA512 to detect changes by a non-malicious
>>>> actor. MD5 should be plenty, and even CRC32 would be enough. I
>>>> suggest downgrading the hash complexity.
>>>>
>>>> Gili
>>>>
>>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
>>>>> Hi,
>>>>>
>>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
>>>>> issue.
>>>>>
>>>>> It only affects the first PDF operation (after that it's quite
>>>>> fast), but
>>>>> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
>>>>>
>>>>> Profiling reveals that this Kotlin code triggers the delay:
>>>>>
>>>>>       val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>>>>>
>>>>> The thread dump shows that almost all time is spent in this method:
>>>>>
>>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>>>>>
>>>>> I assume that this is related to PDFBOX-5684.
>>>>>
>>>>> Is this possible to work around? Or is it possible to fix?
>>>>>
>>>>> BR Kjetil
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Font operation takes a long time with 3.0.1

Posted by Kjetil Ødegaard <kj...@dcompany.no.INVALID>.
To clarify, this stack trace is not printed anywhere. I got it from
stepping into the code and invoking printStackTrace() on the exception to
get the whole stack. See complete stack trace below.

I agree with your theory, it matches what I'm seeing. These fonts are never
added to the cache file, so the cache file is always rebuilt.

I double checked the cache file again and there is no trace of these two
fonts, but lots of entries for other fonts (of different weights). I see
from the timestamp on the file that it is rebuilt on every run.

BR Kjetil

java.io.EOFException
at
org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
at
org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
at
org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
at
org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
at
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
at
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
at
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
at
org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
at
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
at
org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
at
org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
at
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)

tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <TH...@t-online.de>:

> Please do also post the full (for pdfbox / fontbox) stack trace. I have
> a theory why it happens, which is that addTrueTypeCollection() does not
> add the font as "*skipexception*" to the cache file because it's not
> done in the exception handler.
>
> Tilman
>
> On 04.12.2023 21:17, Tilman Hausherr wrote:
> > Does the stack trace appear at every start? If yes then it's a bug.
> > The intent of the current code is that bad fonts aren't retried. The
> > font cache file should contain a line with "*skipexception*" for that
> > font. Can you look at it for the two font files?
> >
> > I could change SHA512 to CRC32. It has the advantage that it won't
> > trigger people who heard about MD5 😂
> >
> > I made a test and CRC32 is 20% faster.
> >
> > Tilman
> >
> > On 04.12.2023 18:48, Gili Tzabari wrote:
> >> I think the commit contains a typo:
> >>
> >>
> >> 872
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872>
>
> >>     private static String computeHash(byte[] ba)
> >> 873
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873>
>
> >>     {
> >> 874
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874>
>
> >>     MessageDigest md;
> >> 875
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875>
>
> >>     try
> >> 876
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876>
>
> >>     {
> >> 877
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877>
>
> >>     md = MessageDigest.getInstance("SHA512");
> >> 878
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878>
>
> >>     byte[] md5 = md.digest(ba);
> >> 879
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879>
>
> >>     return Hex.getString(md5);
> >> 880
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880>
>
> >>     }
> >> 881
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881>
>
> >>     catch (NoSuchAlgorithmException ex)
> >> 882
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882>
>
> >>     {
> >> 883
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883>
>
> >>     // never happens
> >> 884
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884>
>
> >>     return "";
> >> 885
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885>
>
> >>     }
> >> 886
> >> <
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886>
>
> >>     }
> >>
> >> You shouldn't need to use SHA512 to detect changes by a non-malicious
> >> actor. MD5 should be plenty, and even CRC32 would be enough. I
> >> suggest downgrading the hash complexity.
> >>
> >> Gili
> >>
> >> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
> >>> Hi,
> >>>
> >>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
> >>> issue.
> >>>
> >>> It only affects the first PDF operation (after that it's quite
> >>> fast), but
> >>> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
> >>>
> >>> Profiling reveals that this Kotlin code triggers the delay:
> >>>
> >>>      val font = PDType1Font(Standard14Fonts.FontName.COURIER)
> >>>
> >>> The thread dump shows that almost all time is spent in this method:
> >>>
> >>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
> >>>
> >>> I assume that this is related to PDFBOX-5684.
> >>>
> >>> Is this possible to work around? Or is it possible to fix?
> >>>
> >>> BR Kjetil
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Font operation takes a long time with 3.0.1

Posted by Tilman Hausherr <TH...@t-online.de>.
Please do also post the full (for pdfbox / fontbox) stack trace. I have 
a theory why it happens, which is that addTrueTypeCollection() does not 
add the font as "*skipexception*" to the cache file because it's not 
done in the exception handler.

Tilman

On 04.12.2023 21:17, Tilman Hausherr wrote:
> Does the stack trace appear at every start? If yes then it's a bug. 
> The intent of the current code is that bad fonts aren't retried. The 
> font cache file should contain a line with "*skipexception*" for that 
> font. Can you look at it for the two font files?
>
> I could change SHA512 to CRC32. It has the advantage that it won't 
> trigger people who heard about MD5 😂
>
> I made a test and CRC32 is 20% faster.
>
> Tilman
>
> On 04.12.2023 18:48, Gili Tzabari wrote:
>> I think the commit contains a typo:
>>
>>
>> 872 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872> 
>>     private static String computeHash(byte[] ba)
>> 873 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873> 
>>     {
>> 874 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874> 
>>     MessageDigest md;
>> 875 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875> 
>>     try
>> 876 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876> 
>>     {
>> 877 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877> 
>>     md = MessageDigest.getInstance("SHA512");
>> 878 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878> 
>>     byte[] md5 = md.digest(ba);
>> 879 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879> 
>>     return Hex.getString(md5);
>> 880 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880> 
>>     }
>> 881 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881> 
>>     catch (NoSuchAlgorithmException ex)
>> 882 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882> 
>>     {
>> 883 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883> 
>>     // never happens
>> 884 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884> 
>>     return "";
>> 885 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885> 
>>     }
>> 886 
>> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886> 
>>     }
>>
>> You shouldn't need to use SHA512 to detect changes by a non-malicious 
>> actor. MD5 should be plenty, and even CRC32 would be enough. I 
>> suggest downgrading the hash complexity.
>>
>> Gili
>>
>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
>>> Hi,
>>>
>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance 
>>> issue.
>>>
>>> It only affects the first PDF operation (after that it's quite 
>>> fast), but
>>> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
>>>
>>> Profiling reveals that this Kotlin code triggers the delay:
>>>
>>>      val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>>>
>>> The thread dump shows that almost all time is spent in this method:
>>>
>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>>>
>>> I assume that this is related to PDFBOX-5684.
>>>
>>> Is this possible to work around? Or is it possible to fix?
>>>
>>> BR Kjetil
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Font operation takes a long time with 3.0.1

Posted by Tilman Hausherr <TH...@t-online.de>.
Does the stack trace appear at every start? If yes then it's a bug. The 
intent of the current code is that bad fonts aren't retried. The font 
cache file should contain a line with "*skipexception*" for that font. 
Can you look at it for the two font files?

I could change SHA512 to CRC32. It has the advantage that it won't 
trigger people who heard about MD5 😂

I made a test and CRC32 is 20% faster.

Tilman

On 04.12.2023 18:48, Gili Tzabari wrote:
> I think the commit contains a typo:
>
>
> 872 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872> 
>     private static String computeHash(byte[] ba)
> 873 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873> 
>     {
> 874 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874> 
>     MessageDigest md;
> 875 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875> 
>     try
> 876 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876> 
>     {
> 877 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877> 
>     md = MessageDigest.getInstance("SHA512");
> 878 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878> 
>     byte[] md5 = md.digest(ba);
> 879 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879> 
>     return Hex.getString(md5);
> 880 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880> 
>     }
> 881 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881> 
>     catch (NoSuchAlgorithmException ex)
> 882 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882> 
>     {
> 883 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883> 
>     // never happens
> 884 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884> 
>     return "";
> 885 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885> 
>     }
> 886 
> <https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886> 
>     }
>
> You shouldn't need to use SHA512 to detect changes by a non-malicious 
> actor. MD5 should be plenty, and even CRC32 would be enough. I suggest 
> downgrading the hash complexity.
>
> Gili
>
> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
>> Hi,
>>
>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance issue.
>>
>> It only affects the first PDF operation (after that it's quite fast), 
>> but
>> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
>>
>> Profiling reveals that this Kotlin code triggers the delay:
>>
>>      val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>>
>> The thread dump shows that almost all time is spent in this method:
>>
>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>>
>> I assume that this is related to PDFBOX-5684.
>>
>> Is this possible to work around? Or is it possible to fix?
>>
>> BR Kjetil
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Font operation takes a long time with 3.0.1

Posted by Gili Tzabari <co...@gmail.com>.
I think the commit contains a typo:


872 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872> 
	private static String computeHash(byte[] ba)
873 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873> 
	{
874 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874> 
	MessageDigest md;
875 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875> 
	try
876 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876> 
	{
877 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877> 
	md = MessageDigest.getInstance("SHA512");
878 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878> 
	byte[] md5 = md.digest(ba);
879 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879> 
	return Hex.getString(md5);
880 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880> 
	}
881 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881> 
	catch (NoSuchAlgorithmException ex)
882 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882> 
	{
883 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883> 
	// never happens
884 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884> 
	return "";
885 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885> 
	}
886 
<https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886> 
	}

You shouldn't need to use SHA512 to detect changes by a non-malicious 
actor. MD5 should be plenty, and even CRC32 would be enough. I suggest 
downgrading the hash complexity.

Gili

On 2023-12-04 10:21, Kjetil Ødegaard wrote:
> Hi,
>
> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance issue.
>
> It only affects the first PDF operation (after that it's quite fast), but
> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
>
> Profiling reveals that this Kotlin code triggers the delay:
>
>      val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>
> The thread dump shows that almost all time is spent in this method:
>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>
> I assume that this is related to PDFBOX-5684.
>
> Is this possible to work around? Or is it possible to fix?
>
> BR Kjetil
>

Re: Font operation takes a long time with 3.0.1

Posted by Kjetil Ødegaard <kj...@dcompany.no.INVALID>.
It happens for me each time I restart the app. The first time it takes ~20
seconds, the next time it's much faster. This is using openjdk 21.0.1 on
macOS Sonoma 14.1.2 (not using Docker).

Checked ~/.pdfbox.cache and it gets written every run. Did some debugging
and it looks like I'm hitting this case:

            // re-build the entire cache if we encounter un-cached fonts
(could be optimised)
            LOG.warn(pending.size() + " new fonts found, font cache will be
re-built");

pending contains these values:

/System/Library/Fonts/NotoSansKannada.ttc
/System/Library/Fonts/NotoSerifMyanmar.ttc

Seems like these fonts fail to load and are not added to the disk cache
file. See stack trace below.

BR Kjetil

java.io.EOFException
at
org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
at
org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
at
org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)

BR Kjetil

man. 4. des. 2023 kl. 16:41 skrev Tilman Hausherr <TH...@t-online.de>:

> This should happen only once in 3.0.1, unless you're working with a
> container without font cache file in the image.
>
> SHA512 checksum is done only if the file modification date of a font file
> has changed, then we check whether the content has changed.
>
> Tilman
>
> -- Original-Nachricht --
> Von: Kjetil Ødegaard <kj...@dcompany.no.invalid>
> Betreff: Font operation takes a long time with 3.0.1
> Datum: 04.12.2023, 16:21 Uhr
> An: users@pdfbox.apache.org
>
> Hi,
>
> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance issue.
>
> It only affects the first PDF operation (after that it's quite fast), but
> it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).
>
> Profiling reveals that this Kotlin code triggers the delay:
>
>     val font = PDType1Font(Standard14Fonts.FontName.COURIER)
>
> The thread dump shows that almost all time is spent in this method:
>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
>
> I assume that this is related to PDFBOX-5684.
>
> Is this possible to work around? Or is it possible to fix?
>
> BR Kjetil
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

AW: Font operation takes a long time with 3.0.1

Posted by Tilman Hausherr <TH...@t-online.de>.
This should happen only once in 3.0.1, unless you're working with a container without font cache file in the image.

SHA512 checksum is done only if the file modification date of a font file has changed, then we check whether the content has changed.

Tilman

-- Original-Nachricht --
Von: Kjetil Ødegaard <kj...@dcompany.no.invalid>
Betreff: Font operation takes a long time with 3.0.1
Datum: 04.12.2023, 16:21 Uhr
An: users@pdfbox.apache.org

Hi,

I tried to upgrade an app to PDFBox 3.0.1 and I see a performance issue.

It only affects the first PDF operation (after that it's quite fast), but
it's a bit annoying since it takes about 20 seconds (on my M1 Macboox).

Profiling reveals that this Kotlin code triggers the delay:

    val font = PDType1Font(Standard14Fonts.FontName.COURIER)

The thread dump shows that almost all time is spent in this method:

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash

I assume that this is related to PDFBOX-5684.

Is this possible to work around? Or is it possible to fix?

BR Kjetil


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org