You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Joel Hirsh <jo...@gmail.com> on 2019/05/30 00:19:27 UTC

Change in PDFTextStripper return from 2.0.11 to 2.0.15

I have some files that are getting very different results in version 2.0.15
compared to 2.0.11

The files have type1 fonts that in 2.0.11 TextPosition.getHeight() returns
6.33

But in 2.0.15 the TextPosition.getHeight() returns  0.81

Any idea on what might have changed?  I thought that PDFTextStripper was
part of legacy code that might be ugly and incorrect, but was at least
stable. And BTW, the 6.33 is correct.

I have a series of text size fixups that I first created 5 years ago, and
tweaked when moving to version 2.  And although they are undoubtedly hacks,
they have been stable on version 2, up until now.

Re: Change in PDFTextStripper return from 2.0.11 to 2.0.15

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 30.05.2019 um 02:19 schrieb Joel Hirsh:
> I have some files that are getting very different results in version 2.0.15
> compared to 2.0.11
>
> The files have type1 fonts that in 2.0.11 TextPosition.getHeight() returns
> 6.33
>
> But in 2.0.15 the TextPosition.getHeight() returns  0.81
>
> Any idea on what might have changed?  I thought that PDFTextStripper was
> part of legacy code that might be ugly and incorrect, but was at least
> stable. And BTW, the 6.33 is correct.
>
> I have a series of text size fixups that I first created 5 years ago, and
> tweaked when moving to version 2.  And although they are undoubtedly hacks,
> they have been stable on version 2, up until now.
>
Yes this has changed from time to time, not the stripper but 
LegacyPDFStreamEngine. It changed again just a few days ago, try with 
the snapshot. This code segment:

         // sometimes the bbox has very high values, but CapHeight is OK
         PDFontDescriptor fontDescriptor = font.getFontDescriptor();
         if (fontDescriptor != null)
         {
             float capHeight = fontDescriptor.getCapHeight();
             if (Float.compare(capHeight, 0) != 0 &&
                 (capHeight < glyphHeight || Float.compare(glyphHeight, 
0) == 0))
             {
                 glyphHeight = capHeight;
             }
             // PDFBOX-3464, PDFBOX-4480, PDFBOX-4553:
             // sometimes even CapHeight has very high value, but Ascent 
and Descent are ok
             float ascent = fontDescriptor.getAscent();
             float descent = fontDescriptor.getDescent();
             if (capHeight > ascent && ascent > 0 && descent < 0 &&
                 ((ascent - descent) / 2 < glyphHeight || 
Float.compare(glyphHeight, 0) == 0))
             {
                 glyphHeight = (ascent - descent) / 2;
             }
         }

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org