You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Fred Andrews (JIRA)" <ji...@apache.org> on 2014/11/19 06:03:33 UTC

[jira] [Created] (PDFBOX-2508) Text extraction getting zero font height, bad widths, and ? for text in this PDF with Type 3 Fonts

Fred Andrews created PDFBOX-2508:
------------------------------------

             Summary: Text extraction getting zero font height, bad widths, and ? for text in this PDF with Type 3 Fonts
                 Key: PDFBOX-2508
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2508
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.8.7
            Reporter: Fred Andrews


Attached file is just line one from a file where every piece of text has these problems.  All the other lines were removed with Nitro to make a small test case.

This is the output from PrintTextLocations example:
String[211.92,356.8801 fs=58.0 xscale=58.0 height=1.75392 space=190528.28 width=1.7052002]?
String[129.84,347.04 fs=58.0 xscale=58.0 height=2.72832 space=288435.66 width=2.679596]?
String[70.32,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=7.0643997]?
String[77.3844,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=4.8720016]?
String[82.2564,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.333603]?
String[88.590004,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.577202]?
String[95.167206,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.0899963]?
String[101.2572,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.333603]?
String[107.590805,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.0899963]?
String[113.6808,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=4.8720016]?
String[118.5528,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=3.1668015]?
String[121.719604,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.333603]?
String[128.0532,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.577194]?
String[134.63042,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=6.0899963]?
String[140.72041,299.28 fs=58.0 xscale=58.0 height=3.31296 space=349985.12 width=3.1667938]?
String[522.95984,293.28 fs=58.0 xscale=58.0 height=1.36416 space=150394.36 width=1.4616089]?

Fontsize is way too big (should be more like 8), value for space is ridiculous, height is too small.  And each character is coming through as a '?'.  The original file has this on every piece of text.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)