You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Peter Prusinowski <pe...@gmx.de> on 2016/03/02 14:48:33 UTC
PrintTextLocations 1.8 vs 2.0
Hello,
I have noticed that the PrintTextLocations example in 1.8 and 2.0 gives
different results for text.getHeightDir(). In 1.8 the value seems to be
right, but in 2.0 it is too small. I tried with some PDFBox created
documents. Is this a bug ?
Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
> Peter Prusinowski <pe...@gmx.de> hat am 16. März 2016 um 09:52
> geschrieben:
>
>
> Good morning,
>
> thank you for the hints, now I am overwriting showGlyph() and trying to
> get the value with
>
> PDSimpleFont sf = (PDSimpleFont) font;
> String name = sf.getEncoding().getName(code);
> sf.getPath(name).getBounds()
>
> but I am getting the same height, no matter which font size is set. This
> happens with type1 and truetype fonts. What am I doing wrong ?
The font provides always the same unscaled shapes. You have to take the text
transformation matrix and the font matrix into account. Have a look at
PageDrawer#showFontGlyph to see how to do so.
HTH
Andreas
>
> Am 07.03.2016 um 18:16 schrieb Tilman Hausherr:
> > Am 07.03.2016 um 11:46 schrieb Peter Prusinowski:
> >> Okay, thank you for information. I tried to get the height with
> >> getPath(). If its one of the 14 standard fonts, I can get the height
> >> with PDType1Font.fontName.getPath(text.getUnicode()).getBounds()).
> >> But I dont know how to get the information from other fonts in a
> >> generic way. Do you have a hint for me ?
> >
> > It is not available for all fonts. It is available for all
> > PDSimpleFont objects, except for PDType3Font (which doesn't draw just
> > vectors).
> >
> > The best would be to look at the source code, at PageDrawer.java
> >
> > createGlyph2D() returns a Glyph2D for the font. That one you can use
> > for glyph2D.getPathForCharacterCode(code);
> >
> > See also showFontGlyph(), you can override that one in a subclass.
> >
> > Have also a look at showGlyph(), this makes a difference between type3
> > fonts and others. See also CustomGraphicsStreamEngine.
> >
> > Tilman
> >
> >
> >
> >>
> >> Peter
> >>
> >> Am 06.03.2016 um 17:40 schrieb Tilman Hausherr:
> >>>
> >>> In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of
> >>> each glyph. In a string, it uses a maximum which it keeps for the
> >>> string, that results in the weird effect that the "d" is slightly
> >>> higher. If the string is changed so that another glyph is appended,
> >>> the larger height is kept.
> >>>
> >>> In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the
> >>> bounding box from the font descriptor. The not-halved bounding box
> >>> is usually too high.
> >>>
> >>> Anyway, the 1.8 logic would work for you for standard 14 fonts, but
> >>> not for all other fonts.
> >>>
> >>> So there is no bug in 1.8 not in 2.0.
> >>>
> >>> Tilman
> >>>
> >>> Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
> >>>> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
> >>>>> Okay, I am trying to replace some words in documents and use
> >>>>> text.height to "delete" these words. Here is an example document :
> >>>>> http://workupload.com/file/G8ipDe8j
> >>>>
> >>>> The getHeightDir() is not the best strategy, for the reason I
> >>>> mentioned yesterday. In your case, you should call getPath() on the
> >>>> glyphs and get the bounding box. Or just get the font bounding box
> >>>> (there's a method) height, however that one is often too high, so
> >>>> there's a risk that you blank the line above.
> >>>>
> >>>> But thanks for the file, I'll try to find out why it is different.
> >>>> The heights in 1.8 are surprising, usually they are never so
> >>>> "perfect" (as I said yesterday). And for some reason, in 1.8 the
> >>>> height of the last glyph is slightly different although it is all
> >>>> in one string.
> >>>>
> >>>> 1.8:
> >>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=10.108002]H
> >>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=7.784004]e
> >>>> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=3.8919983]l
> >>>> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=3.8919983]l
> >>>> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=8.553993]o
> >>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=3.8919983]
> >>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=13.216003]W
> >>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=8.554001]o
> >>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=5.445999]r
> >>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001
> >>>> space=3.8920004 width=3.8919983]l
> >>>> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
> >>>> space=3.8920004 width=8.554001]d <========= ???
> >>>>
> >>>> 2.0:
> >>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=10.108002]H
> >>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=7.7839966]e
> >>>> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=3.8919983]l
> >>>> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=3.8919983]l
> >>>> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33
> >>>> space=3.8920004 width=8.554001]o
> >>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=3.8919983]
> >>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=13.216003]W
> >>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=8.554001]o
> >>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=5.445999]r
> >>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=3.8919983]l
> >>>> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> >>>> width=8.554001]d
> >>>>
> >>>>
> >>>>
> >>>> Tilman
> >>>>
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
> >>>>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I have noticed that the PrintTextLocations example in 1.8 and
> >>>>>>> 2.0 gives different results for text.getHeightDir(). In 1.8 the
> >>>>>>> value seems to be right, but in 2.0 it is too small. I tried
> >>>>>>> with some PDFBox created documents. Is this a bug ?
> >>>>>>
> >>>>>> Maybe, maybe not. The height is a heuristic value to help with
> >>>>>> text extraction, which is sometimes computed differently in 2.0,
> >>>>>> and it is usually about the height of an "a". Please upload the PDF.
> >>>>>>
> >>>>>> Tilman
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Peter Prusinowski <pe...@gmx.de>.
Sorry for the late response. I tested the code with severel documents
and it works well :-) Thank you very much.
Am 17.03.2016 um 18:07 schrieb Tilman Hausherr:
> I found a case where the strategy you mentioned didn't work for a TT
> font, the file 032431. Here's some updated code.
>
>
> private Shape calculateGlyphBounds(Matrix textRenderingMatrix,
> PDFont font, int code) throws IOException
> {
> GeneralPath path = null;
> AffineTransform at = textRenderingMatrix.createAffineTransform();
> at.concatenate(font.getFontMatrix().createAffineTransform());
> if (font instanceof PDType3Font)
> {
> PDType3Font t3Font = (PDType3Font) font;
> PDType3CharProc charProc = t3Font.getCharProc(code);
> if (charProc != null)
> {
> PDRectangle glyphBBox = charProc.getGlyphBBox();
> if (glyphBBox != null)
> {
> path = glyphBBox.toGeneralPath();
> }
> }
> }
> else if (font instanceof PDVectorFont)
> {
> PDVectorFont vectorFont = (PDVectorFont) font;
> path = vectorFont.getPath(code);
>
> if (font instanceof PDTrueTypeFont)
> {
> PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
> int unitsPerEm =
> ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
> at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
> }
> if (font instanceof PDType0Font)
> {
> PDType0Font t0font = (PDType0Font) font;
> if (t0font.getDescendantFont() instanceof PDCIDFontType2)
> {
> int unitsPerEm = ((PDCIDFontType2)
> t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
> at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
> }
> }
> }
> else if (font instanceof PDSimpleFont)
> {
> PDSimpleFont simpleFont = (PDSimpleFont) font;
>
> // these two lines do not always work, e.g. for the TT
> fonts in file 032431.pdf
> // which is why PDVectorFont is tried first.
> String name = simpleFont.getEncoding().getName(code);
> path = simpleFont.getPath(name);
> }
> else
> {
> // shouldn't happen, please open issue in JIRA
> System.out.println("Unknown font class: " + font.getClass());
> }
> if (path == null)
> {
> return null;
> }
> return at.createTransformedShape(path.getBounds2D());
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
I found a case where the strategy you mentioned didn't work for a TT
font, the file 032431. Here's some updated code.
private Shape calculateGlyphBounds(Matrix textRenderingMatrix,
PDFont font, int code) throws IOException
{
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);
if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm =
ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2)
t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
// these two lines do not always work, e.g. for the TT
fonts in file 032431.pdf
// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}
return at.createTransformedShape(path.getBounds2D());
}
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 16.03.2016 um 09:52 schrieb Peter Prusinowski:
>
>
> thank you for the hints, now I am overwriting showGlyph() and trying
> to get the value with
>
> PDSimpleFont sf = (PDSimpleFont) font;
> String name = sf.getEncoding().getName(code);
> sf.getPath(name).getBounds()
>
> but I am getting the same height, no matter which font size is set.
> This happens with type1 and truetype fonts. What am I doing wrong ?
Here's some code, use it with the DrawPrintTextLocations example. Please
tell if it works, and if possible, upload files where it doesn't.
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font,
int code, String unicode, Vector displacement) throws IOException
{
super.showGlyph(textRenderingMatrix, font, code, unicode,
displacement);
Rectangle2D bounds = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
String name = simpleFont.getEncoding().getName(code);
GeneralPath path = simpleFont.getPath(name);
bounds = path.getBounds2D();
at.scale(1/1000f, 1/1000f);
if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm =
ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
GeneralPath path = vectorFont.getPath(code);
bounds = path.getBounds2D();
at.scale(1/1000f, 1/1000f);
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2)
t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else
{
System.out.println("TODO other: " + font.getClass());
}
if (bounds != null)
{
Shape s = at.createTransformedShape(bounds);
// flip y-axis
AffineTransform flip = new AffineTransform();
flip.translate(0, getCurrentPage().getBBox().getHeight());
flip.scale(1, -1);
s = flip.createTransformedShape(s);
AffineTransform transform = g2d.getTransform();
int rotation = getCurrentPage().getRotation();
if (rotation != 0)
{
PDRectangle mediaBox = getCurrentPage().getMediaBox();
switch (rotation)
{
case 90:
g2d.translate(mediaBox.getHeight(), 0);
break;
case 270:
g2d.translate(0, mediaBox.getWidth());
break;
case 180:
g2d.translate(mediaBox.getWidth(),
mediaBox.getHeight());
break;
default:
break;
}
g2d.rotate(Math.toRadians(rotation));
}
g2d.setColor(Color.CYAN);
g2d.draw(s);
if (rotation != 0)
{
g2d.setTransform(transform);
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Peter Prusinowski <pe...@gmx.de>.
Good morning,
thank you for the hints, now I am overwriting showGlyph() and trying to
get the value with
PDSimpleFont sf = (PDSimpleFont) font;
String name = sf.getEncoding().getName(code);
sf.getPath(name).getBounds()
but I am getting the same height, no matter which font size is set. This
happens with type1 and truetype fonts. What am I doing wrong ?
Am 07.03.2016 um 18:16 schrieb Tilman Hausherr:
> Am 07.03.2016 um 11:46 schrieb Peter Prusinowski:
>> Okay, thank you for information. I tried to get the height with
>> getPath(). If its one of the 14 standard fonts, I can get the height
>> with PDType1Font.fontName.getPath(text.getUnicode()).getBounds()).
>> But I dont know how to get the information from other fonts in a
>> generic way. Do you have a hint for me ?
>
> It is not available for all fonts. It is available for all
> PDSimpleFont objects, except for PDType3Font (which doesn't draw just
> vectors).
>
> The best would be to look at the source code, at PageDrawer.java
>
> createGlyph2D() returns a Glyph2D for the font. That one you can use
> for glyph2D.getPathForCharacterCode(code);
>
> See also showFontGlyph(), you can override that one in a subclass.
>
> Have also a look at showGlyph(), this makes a difference between type3
> fonts and others. See also CustomGraphicsStreamEngine.
>
> Tilman
>
>
>
>>
>> Peter
>>
>> Am 06.03.2016 um 17:40 schrieb Tilman Hausherr:
>>>
>>> In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of
>>> each glyph. In a string, it uses a maximum which it keeps for the
>>> string, that results in the weird effect that the "d" is slightly
>>> higher. If the string is changed so that another glyph is appended,
>>> the larger height is kept.
>>>
>>> In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the
>>> bounding box from the font descriptor. The not-halved bounding box
>>> is usually too high.
>>>
>>> Anyway, the 1.8 logic would work for you for standard 14 fonts, but
>>> not for all other fonts.
>>>
>>> So there is no bug in 1.8 not in 2.0.
>>>
>>> Tilman
>>>
>>> Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
>>>> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
>>>>> Okay, I am trying to replace some words in documents and use
>>>>> text.height to "delete" these words. Here is an example document :
>>>>> http://workupload.com/file/G8ipDe8j
>>>>
>>>> The getHeightDir() is not the best strategy, for the reason I
>>>> mentioned yesterday. In your case, you should call getPath() on the
>>>> glyphs and get the bounding box. Or just get the font bounding box
>>>> (there's a method) height, however that one is often too high, so
>>>> there's a risk that you blank the line above.
>>>>
>>>> But thanks for the file, I'll try to find out why it is different.
>>>> The heights in 1.8 are surprising, usually they are never so
>>>> "perfect" (as I said yesterday). And for some reason, in 1.8 the
>>>> height of the last glyph is slightly different although it is all
>>>> in one string.
>>>>
>>>> 1.8:
>>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=10.108002]H
>>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=7.784004]e
>>>> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=3.8919983]l
>>>> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=3.8919983]l
>>>> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=8.553993]o
>>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=3.8919983]
>>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=13.216003]W
>>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=8.554001]o
>>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=5.445999]r
>>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>>>> space=3.8920004 width=3.8919983]l
>>>> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
>>>> space=3.8920004 width=8.554001]d <========= ???
>>>>
>>>> 2.0:
>>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=10.108002]H
>>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=7.7839966]e
>>>> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=3.8919983]l
>>>> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=3.8919983]l
>>>> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33
>>>> space=3.8920004 width=8.554001]o
>>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=3.8919983]
>>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=13.216003]W
>>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=8.554001]o
>>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=5.445999]r
>>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=3.8919983]l
>>>> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>>> width=8.554001]d
>>>>
>>>>
>>>>
>>>> Tilman
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>>>>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have noticed that the PrintTextLocations example in 1.8 and
>>>>>>> 2.0 gives different results for text.getHeightDir(). In 1.8 the
>>>>>>> value seems to be right, but in 2.0 it is too small. I tried
>>>>>>> with some PDFBox created documents. Is this a bug ?
>>>>>>
>>>>>> Maybe, maybe not. The height is a heuristic value to help with
>>>>>> text extraction, which is sometimes computed differently in 2.0,
>>>>>> and it is usually about the height of an "a". Please upload the PDF.
>>>>>>
>>>>>> Tilman
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.03.2016 um 11:46 schrieb Peter Prusinowski:
> Okay, thank you for information. I tried to get the height with
> getPath(). If its one of the 14 standard fonts, I can get the height
> with PDType1Font.fontName.getPath(text.getUnicode()).getBounds()). But
> I dont know how to get the information from other fonts in a generic
> way. Do you have a hint for me ?
It is not available for all fonts. It is available for all PDSimpleFont
objects, except for PDType3Font (which doesn't draw just vectors).
The best would be to look at the source code, at PageDrawer.java
createGlyph2D() returns a Glyph2D for the font. That one you can use for
glyph2D.getPathForCharacterCode(code);
See also showFontGlyph(), you can override that one in a subclass.
Have also a look at showGlyph(), this makes a difference between type3
fonts and others. See also CustomGraphicsStreamEngine.
Tilman
>
> Peter
>
> Am 06.03.2016 um 17:40 schrieb Tilman Hausherr:
>>
>> In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of
>> each glyph. In a string, it uses a maximum which it keeps for the
>> string, that results in the weird effect that the "d" is slightly
>> higher. If the string is changed so that another glyph is appended,
>> the larger height is kept.
>>
>> In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the
>> bounding box from the font descriptor. The not-halved bounding box is
>> usually too high.
>>
>> Anyway, the 1.8 logic would work for you for standard 14 fonts, but
>> not for all other fonts.
>>
>> So there is no bug in 1.8 not in 2.0.
>>
>> Tilman
>>
>> Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
>>> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
>>>> Okay, I am trying to replace some words in documents and use
>>>> text.height to "delete" these words. Here is an example document :
>>>> http://workupload.com/file/G8ipDe8j
>>>
>>> The getHeightDir() is not the best strategy, for the reason I
>>> mentioned yesterday. In your case, you should call getPath() on the
>>> glyphs and get the bounding box. Or just get the font bounding box
>>> (there's a method) height, however that one is often too high, so
>>> there's a risk that you blank the line above.
>>>
>>> But thanks for the file, I'll try to find out why it is different.
>>> The heights in 1.8 are surprising, usually they are never so
>>> "perfect" (as I said yesterday). And for some reason, in 1.8 the
>>> height of the last glyph is slightly different although it is all in
>>> one string.
>>>
>>> 1.8:
>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=10.108002]H
>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=7.784004]e
>>> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=3.8919983]l
>>> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=3.8919983]l
>>> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=8.553993]o
>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=3.8919983]
>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=13.216003]W
>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=8.554001]o
>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=5.445999]r
>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>>> space=3.8920004 width=3.8919983]l
>>> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
>>> space=3.8920004 width=8.554001]d <========= ???
>>>
>>> 2.0:
>>> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=10.108002]H
>>> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=7.7839966]e
>>> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=3.8919983]l
>>> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=3.8919983]l
>>> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33
>>> space=3.8920004 width=8.554001]o
>>> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=3.8919983]
>>> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=13.216003]W
>>> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=8.554001]o
>>> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=5.445999]r
>>> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=3.8919983]l
>>> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>>> width=8.554001]d
>>>
>>>
>>>
>>> Tilman
>>>
>>>>
>>>> Peter
>>>>
>>>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>>>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>>>>> Hello,
>>>>>>
>>>>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0
>>>>>> gives different results for text.getHeightDir(). In 1.8 the value
>>>>>> seems to be right, but in 2.0 it is too small. I tried with some
>>>>>> PDFBox created documents. Is this a bug ?
>>>>>
>>>>> Maybe, maybe not. The height is a heuristic value to help with
>>>>> text extraction, which is sometimes computed differently in 2.0,
>>>>> and it is usually about the height of an "a". Please upload the PDF.
>>>>>
>>>>> Tilman
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Peter Prusinowski <pe...@gmx.de>.
Okay, thank you for information. I tried to get the height with
getPath(). If its one of the 14 standard fonts, I can get the height
with PDType1Font.fontName.getPath(text.getUnicode()).getBounds()). But I
dont know how to get the information from other fonts in a generic way.
Do you have a hint for me ?
Peter
Am 06.03.2016 um 17:40 schrieb Tilman Hausherr:
>
> In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of
> each glyph. In a string, it uses a maximum which it keeps for the
> string, that results in the weird effect that the "d" is slightly
> higher. If the string is changed so that another glyph is appended,
> the larger height is kept.
>
> In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the
> bounding box from the font descriptor. The not-halved bounding box is
> usually too high.
>
> Anyway, the 1.8 logic would work for you for standard 14 fonts, but
> not for all other fonts.
>
> So there is no bug in 1.8 not in 2.0.
>
> Tilman
>
> Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
>> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
>>> Okay, I am trying to replace some words in documents and use
>>> text.height to "delete" these words. Here is an example document :
>>> http://workupload.com/file/G8ipDe8j
>>
>> The getHeightDir() is not the best strategy, for the reason I
>> mentioned yesterday. In your case, you should call getPath() on the
>> glyphs and get the bounding box. Or just get the font bounding box
>> (there's a method) height, however that one is often too high, so
>> there's a risk that you blank the line above.
>>
>> But thanks for the file, I'll try to find out why it is different.
>> The heights in 1.8 are surprising, usually they are never so
>> "perfect" (as I said yesterday). And for some reason, in 1.8 the
>> height of the last glyph is slightly different although it is all in
>> one string.
>>
>> 1.8:
>> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=10.108002]H
>> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=7.784004]e
>> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=3.8919983]l
>> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=3.8919983]l
>> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=8.553993]o
>> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=3.8919983]
>> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=13.216003]W
>> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=8.554001]o
>> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=5.445999]r
>> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001
>> space=3.8920004 width=3.8919983]l
>> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
>> space=3.8920004 width=8.554001]d <========= ???
>>
>> 2.0:
>> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=10.108002]H
>> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=7.7839966]e
>> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=3.8919983]l
>> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=3.8919983]l
>> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33
>> space=3.8920004 width=8.554001]o
>> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=3.8919983]
>> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=13.216003]W
>> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=8.554001]o
>> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=5.445999]r
>> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=3.8919983]l
>> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
>> width=8.554001]d
>>
>>
>>
>> Tilman
>>
>>>
>>> Peter
>>>
>>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>>>> Hello,
>>>>>
>>>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0
>>>>> gives different results for text.getHeightDir(). In 1.8 the value
>>>>> seems to be right, but in 2.0 it is too small. I tried with some
>>>>> PDFBox created documents. Is this a bug ?
>>>>
>>>> Maybe, maybe not. The height is a heuristic value to help with text
>>>> extraction, which is sometimes computed differently in 2.0, and it
>>>> is usually about the height of an "a". Please upload the PDF.
>>>>
>>>> Tilman
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of
each glyph. In a string, it uses a maximum which it keeps for the
string, that results in the weird effect that the "d" is slightly
higher. If the string is changed so that another glyph is appended, the
larger height is kept.
In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the
bounding box from the font descriptor. The not-halved bounding box is
usually too high.
Anyway, the 1.8 logic would work for you for standard 14 fonts, but not
for all other fonts.
So there is no bug in 1.8 not in 2.0.
Tilman
Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
>> Okay, I am trying to replace some words in documents and use
>> text.height to "delete" these words. Here is an example document :
>> http://workupload.com/file/G8ipDe8j
>
> The getHeightDir() is not the best strategy, for the reason I
> mentioned yesterday. In your case, you should call getPath() on the
> glyphs and get the bounding box. Or just get the font bounding box
> (there's a method) height, however that one is often too high, so
> there's a risk that you blank the line above.
>
> But thanks for the file, I'll try to find out why it is different. The
> heights in 1.8 are surprising, usually they are never so "perfect" (as
> I said yesterday). And for some reason, in 1.8 the height of the last
> glyph is slightly different although it is all in one string.
>
> 1.8:
> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
> width=10.108002]H
> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=7.784004]e
> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=3.8919983]l
> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=3.8919983]l
> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=8.553993]o
> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=3.8919983]
> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=13.216003]W
> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=8.554001]o
> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=5.445999]r
> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001
> space=3.8920004 width=3.8919983]l
> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
> space=3.8920004 width=8.554001]d <========= ???
>
> 2.0:
> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=10.108002]H
> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=7.7839966]e
> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=3.8919983]l
> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=3.8919983]l
> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=8.554001]o
> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=3.8919983]
> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=13.216003]W
> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=8.554001]o
> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=5.445999]r
> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=3.8919983]l
> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
> width=8.554001]d
>
>
>
> Tilman
>
>>
>> Peter
>>
>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>>> Hello,
>>>>
>>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0
>>>> gives different results for text.getHeightDir(). In 1.8 the value
>>>> seems to be right, but in 2.0 it is too small. I tried with some
>>>> PDFBox created documents. Is this a bug ?
>>>
>>> Maybe, maybe not. The height is a heuristic value to help with text
>>> extraction, which is sometimes computed differently in 2.0, and it
>>> is usually about the height of an "a". Please upload the PDF.
>>>
>>> Tilman
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
> Okay, I am trying to replace some words in documents and use
> text.height to "delete" these words. Here is an example document :
> http://workupload.com/file/G8ipDe8j
The getHeightDir() is not the best strategy, for the reason I mentioned
yesterday. In your case, you should call getPath() on the glyphs and get
the bounding box. Or just get the font bounding box (there's a method)
height, however that one is often too high, so there's a risk that you
blank the line above.
But thanks for the file, I'll try to find out why it is different. The
heights in 1.8 are surprising, usually they are never so "perfect" (as I
said yesterday). And for some reason, in 1.8 the height of the last
glyph is slightly different although it is all in one string.
1.8:
String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=10.108002]H
String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=7.784004]e
String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001
space=3.8920004 width=3.8919983]l
String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001
space=3.8920004 width=3.8919983]l
String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=8.553993]o
String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=3.8919983]
String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=13.216003]W
String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=8.554001]o
String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=5.445999]r
String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004
width=3.8919983]l
String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001*
space=3.8920004 width=8.554001]d <========= ???
2.0:
String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=10.108002]H
String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=7.7839966]e
String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=3.8919983]l
String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=3.8919983]l
String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=8.554001]o
String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=3.8919983]
String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=13.216003]W
String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=8.554001]o
String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=5.445999]r
String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=3.8919983]l
String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004
width=8.554001]d
Tilman
>
> Peter
>
> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>> Hello,
>>>
>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0
>>> gives different results for text.getHeightDir(). In 1.8 the value
>>> seems to be right, but in 2.0 it is too small. I tried with some
>>> PDFBox created documents. Is this a bug ?
>>
>> Maybe, maybe not. The height is a heuristic value to help with text
>> extraction, which is sometimes computed differently in 2.0, and it is
>> usually about the height of an "a". Please upload the PDF.
>>
>> Tilman
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
Re: PrintTextLocations 1.8 vs 2.0
Posted by Peter Prusinowski <pe...@gmx.de>.
Okay, I am trying to replace some words in documents and use text.height
to "delete" these words. Here is an example document :
http://workupload.com/file/G8ipDe8j
Peter
Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>> Hello,
>>
>> I have noticed that the PrintTextLocations example in 1.8 and 2.0
>> gives different results for text.getHeightDir(). In 1.8 the value
>> seems to be right, but in 2.0 it is too small. I tried with some
>> PDFBox created documents. Is this a bug ?
>
> Maybe, maybe not. The height is a heuristic value to help with text
> extraction, which is sometimes computed differently in 2.0, and it is
> usually about the height of an "a". Please upload the PDF.
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: PrintTextLocations 1.8 vs 2.0
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
> Hello,
>
> I have noticed that the PrintTextLocations example in 1.8 and 2.0
> gives different results for text.getHeightDir(). In 1.8 the value
> seems to be right, but in 2.0 it is too small. I tried with some
> PDFBox created documents. Is this a bug ?
Maybe, maybe not. The height is a heuristic value to help with text
extraction, which is sometimes computed differently in 2.0, and it is
usually about the height of an "a". Please upload the PDF.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org