You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/11/01 18:49:27 UTC

[jira] [Commented] (PDFBOX-3078) Text height coming in at half size, regression from 1.8

    [ https://issues.apache.org/jira/browse/PDFBOX-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984469#comment-14984469 ] 

Tilman Hausherr commented on PDFBOX-3078:
-----------------------------------------

In 1.8, the width is taken from AFM info and is 703 * 0,001 * 9 = 6,327

In 2.0, the width is taken from the actual font (Arial TT). The BBox height is 1324, the half is taken (665) and this is multiplied with the actual font matrix which is 0.000488 i.e. half of 0,001.

>From the spec:
{code}
The glyph coordinate system is the space in which an individual character’s glyph is defined. All path coordinates and metrics shall be interpreted in glyph space. For all font types except Type 3, the units of glyph space are one-thousandth of a unit of text space; for a Type 3 font, the transformation from glyph space to text space shall be defined by a font matrix specified in an explicit FontMatrix entry in the font
{code}

A quick idea would be to replace in PDFTextStreamEngine
{code}
float height = font.getFontMatrix().transformPoint(0, glyphHeight).y;
{code}
with
{code}
        float height;
        if (font instanceof PDType3Font)
        {
            height = font.getFontMatrix().transformPoint(0, glyphHeight).y;
        }
        else
        {
            height = glyphHeight * 0.001f;
        }
{code}
but this brings a difference in the extraction of the file of PDFBOX-679. (Which is not in the official tests, but in mine).

> Text height coming in at half size, regression from 1.8
> -------------------------------------------------------
>
>                 Key: PDFBOX-3078
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3078
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Joel Hirsh
>         Attachments: wrongsize.pdf
>
>
> Running 11/1 Dvlp build.
> PrintTextLocations on attached file has height of 2.9, which is incorrect.
> String[30.699997,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=5.0040016]1
> String[35.704,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=5.003998]2
> String[40.707996,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=5.003998]8
> String[45.711994,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=5.003998]6
> String[50.715992,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=5.003998]2
> String[63.79999,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 width=4.2210045]^
> Same file, Version 1.8 has height of 6.5, which is about right:
> String[30.699997,144.80005 fs=9.0 xscale=9.0 height=6.327 space=2.5020003 width=5.0040016]1
> String[35.704,144.80005 fs=9.0 xscale=9.0 height=6.327 space=2.5020003 width=5.0040016]2
> String[40.708,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 width=5.0040016]8
> String[45.712,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 width=5.0040016]6
> String[50.716003,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 width=5.0040016]2
> String[63.800007,144.80005 fs=9.0 xscale=9.0 height=3.8160002 space=2.5020003 width=4.220997]^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org