You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/05/07 17:56:04 UTC

[jira] [Comment Edited] (PDFBOX-3780) Heights of Characters

    [ https://issues.apache.org/jira/browse/PDFBOX-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999951#comment-15999951 ] 

Tilman Hausherr edited comment on PDFBOX-3780 at 5/7/17 5:55 PM:
-----------------------------------------------------------------

The last commit improves getCapHeight() and getXHeight() for fonts that have the OS2 table with version 1. Get a snapshot here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/

I haven't touched getAscent() and getDescent()... these are almost correct. The only problem is that Adobe defines ascent differently than the font itself, Adobe doesn't want the accent but the font wants them.

getHeight() is notoriously unreliable... here's the implementation for the subsetted font:
{code}
    @Override
    public float getHeight(int code) throws IOException
    {
        // todo: really we want the BBox, (for text extraction:)
        return (ttf.getHorizontalHeader().getAscender() + -ttf.getHorizontalHeader().getDescender())
                / ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
    }
{code}
In text extraction, we know that the height is not always good, which is why we use capHeight when getHeight delivers weird results.


was (Author: tilman):
The last commit improves getCapHeight() and getXHeight() for fonts that have the OS2 table with version 1. Get a snapshot here
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/
within a few minutes.

I haven't touched getAscent() and getDescent()... these are almost correct. The only problem is that Adobe defines ascent differently than the font itself, Adobe doesn't want the accent but the font wants them.

getHeight() is notoriously unreliable... here's the implementation for the subsetted font:
{code}
    @Override
    public float getHeight(int code) throws IOException
    {
        // todo: really we want the BBox, (for text extraction:)
        return (ttf.getHorizontalHeader().getAscender() + -ttf.getHorizontalHeader().getDescender())
                / ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
    }
{code}
In text extraction, we know that the height is not always good, which is why we use capHeight when getHeight delivers weird results.

> Heights of Characters
> ---------------------
>
>                 Key: PDFBOX-3780
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3780
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.5
>            Reporter: Uwe Möser
>            Priority: Critical
>         Attachments: DejaVuSansCondensed-Bold.ttf, DejaVuSansCondensed.ttf, PDFBoxHeightTest.java, PDFBoxHeightTest.pdf
>
>
> the functions 
> .getFontDescriptor().getCapHeight()
> .getFontDescriptor().getXHeight()
> .getFontDescriptor().getAscent()
> .getFontDescriptor().getDescent()
> getHeight(int code)
> do not work proper especially for embedded fonts, PDType0Font .
> Please see the attached  file PDFBoxHeightTest.pdf where the line is and should be. The fonts were downloaded from http://www.schriftarten-fonts.de/fonts/11283/dejavu_sans_condensed.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org