You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/12/30 05:47:49 UTC

[jira] [Commented] (PDFBOX-3175) PDFTextStreamEngine probably miscalculates text height

    [ https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074623#comment-15074623 ] 

Tilman Hausherr commented on PDFBOX-3175:
-----------------------------------------

It would be better to explain what you tried to do, and how PDFBox failed. Your change makes the text extraction tests fail, and not just one, but many. You're not making any argument why your text extraction is better than the existing one.

The tests are at "PDFBox reactor\pdfbox\src\test\resources\input", the output is at "PDFBox reactor\pdfbox\target\test-output".

Re correct heights, please run the DrawPrintTextLocations example on your file. The red mark is a helper used for text extraction, the blue is the bounding box. Ideally, the red mark should cover small glyphs, e.g. "a", "o", "n", etc. It is not always perfect, but comes close.

> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
>                 Key: PDFBOX-3175
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3175
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Leo
>
> When parsing a PDF document, TextPosition is created with constant text height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
>         float verticalScaling = 1/1000f;
>         if (font instanceof PDType3Font) {
>             Matrix fontMatrix = font.getFontMatrix();
>             verticalScaling = fontMatrix.getValue(1, 1);
>         }
>         float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org