You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Resolved JIRA)" <ji...@apache.org> on 2012/02/08 20:01:00 UTC
[jira] [Resolved] (PDFBOX-611) PDSimpleFont. Font height reported
as zero.
[ https://issues.apache.org/jira/browse/PDFBOX-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-611.
---------------------------------------
Resolution: Fixed
Fix Version/s: 1.7.0
I improved the height calculation in revisiomn 1242038 based on Peters proposal.
Thanks for the contribution!
> PDSimpleFont. Font height reported as zero.
> --------------------------------------------
>
> Key: PDFBOX-611
> URL: https://issues.apache.org/jira/browse/PDFBOX-611
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 0.8.0-incubator
> Environment: Win and Linux
> Reporter: Peter Costello
> Assignee: Andreas Lehmkühler
> Labels: PDSimpleFont, textExtraction
> Fix For: 1.7.0
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The logic for PDSimpleFont.getFontHeight() can return a value of zero.
> This will corrupt or compromise text extraction and layout.
> In particular, test with 'http://www.encana.com/investor/financial/shareholder/pdfs/info-circular-french.pdf', pg 12
> When a PDFontDescriptor is used, the current logic uses:
> 1) an average of xHeight and capHeight.
> xHeight is the height from the baseline to the top of a lower case letter like 'x'.
> CapHeight is the height from the baseline to the top of an upper case latin char.
> 2) xHeight
> 3) capHeight
> 4) ascent
> 5) zero
> This is really bizarre. 'xHeight' is an optional parameter, and 'capHeight' is often missing.
> The font bounding box is a required parameter and is the height that is used by Acrobat Reader when you select a line of text.
> The bounding box is not perfect, because it often overlaps the line above, but it is a consistent value. The problem with the
> current logic is that the reported height varies way too much, and a zero value can be reported.
> I have modified the logic as follows. The goal was to make the nominal values the same as the current logic,
> but return a very similar number when parameters go missing.
> PDFontDescriptor desc = getFontDescriptor();
> if( desc != null ) {
> float height = desc.getCapHeight(); // Top of Cap to baseline (eg 715)
> if (height==0) {
> height=desc.getAscent(); // Max height from baseline (eg 715);
> if (height==0) {
> PDRectangle bbox = desc.getFontBoundingBox();
> height = bbox.getHeight()/2; // Max height less max depth (eg (1006-(-325))=1331/2=665)
> if (height==0) {
> height=desc.getXHeight(); // Top of lower-case to baseline (eg 518)
> height-=desc.getDescent(); // Depth below baseline (eg 209, to get total of 727)
> }
> }
> }
> retval=height;
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira