You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Branden Visser <mr...@gmail.com> on 2016/01/29 15:46:43 UTC

Getting visual bounds of a glyph

Hi everyone,

I thought we had a process to get the visual bounds of a character
sealed, until we bumped into a few more PDF documents where the
bounding box we calculating is way bigger than the character itself
(about double the size). We're using the 2.0.0-SNAPSHOT code, but
we've verified this occurs in both RC2 and RC3.

I've written a gist with a Scala/Java-ish sample of our calculation
[1], where the TextPosition being passed in is the one provided by
PDFTextStripper in processTextPosition (I know it's inaccurate, but
we're only using the text rendering matrix from it which I think
should be accurate?)

I'm assuming you may need more information, but first I'm considering
that our logic for getting the bounds is missing some important pieces
that jump out at you all.

Is the gist correct?

Thanks,
Branden

[1] https://gist.github.com/anonymous/063db3b1b5ed040be41c

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Getting visual bounds of a glyph

Posted by Branden Visser <mr...@gmail.com>.
Hi John, thanks for your reply!

Unfortunately, your suggested version gives the same result.

On Fri, Jan 29, 2016 at 2:20 PM, John Hewson <jo...@jahewson.com> wrote:
>
>> On 29 Jan 2016, at 06:46, Branden Visser <mr...@gmail.com> wrote:
>>
>> I thought we had a process to get the visual bounds of a character
>> sealed, until we bumped into a few more PDF documents where the
>> bounding box we calculating is way bigger than the character itself
>> (about double the size). We're using the 2.0.0-SNAPSHOT code, but
>> we've verified this occurs in both RC2 and RC3.
>
> Assuming that you’re using PDFont#getPath(..) then you should be able to
> measure the visual bounds of a glyph accurately.
>

That's what I thought as well, but it appears I've messed something up
converting from glyph space to device space.

While debugging, I've found that PDCIDFontType2 [2] assumes a matrix
scale of 0.001, with a warning that it may not always be the case.
Could it be that I've hit a case where this transform not accurate?
Aside from that, I must be missing some kind of transformation.

Here is the document that exhibits the behaviour:
http://digitalarchive.wilsoncenter.org/document/117733.pdf?v=749e35894eaceae628d3ad91751a2fef

Thanks again,
Branden


[2] https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L187-L195

>>
>> [1] https://gist.github.com/anonymous/063db3b1b5ed040be41c
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Getting visual bounds of a glyph

Posted by John Hewson <jo...@jahewson.com>.
> On 29 Jan 2016, at 06:46, Branden Visser <mr...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I thought we had a process to get the visual bounds of a character
> sealed, until we bumped into a few more PDF documents where the
> bounding box we calculating is way bigger than the character itself
> (about double the size). We're using the 2.0.0-SNAPSHOT code, but
> we've verified this occurs in both RC2 and RC3.

Assuming that you’re using PDFont#getPath(..) then you should be able to
measure the visual bounds of a glyph accurately.

> I've written a gist with a Scala/Java-ish sample of our calculation
> [1], where the TextPosition being passed in is the one provided by
> PDFTextStripper in processTextPosition (I know it's inaccurate, but
> we're only using the text rendering matrix from it which I think
> should be accurate?)

> I'm assuming you may need more information, but first I'm considering
> that our logic for getting the bounds is missing some important pieces
> that jump out at you all.
> 
> Is the gist correct?

I’ve commented on your gist on GitHub. You’ve mostly got it right but you
want to end up with device space, not user space.

— John

> Thanks,
> Branden
> 
> [1] https://gist.github.com/anonymous/063db3b1b5ed040be41c
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org