You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Alex Shvartz <al...@yahoo.com> on 2009/07/29 00:07:20 UTC

How receive information about color, font and position from PDF page text




My name is Alex.

I subscribed to Developer Mailing List.

 

We are working on converting PDF files to another file
format.

Basically we know how to extract text from specific PDF
page.



If we supposed to convert this file to another format, we
would like to have all information about color, font and position. The question
is: how to do this? 







I think, when I receive a string (I omit images for now), I
also need to know what is position, color, font size and font name for every
character. After, I can convert this information into new file with
another format. But firstable, I supposed to have font, color, etc.
information.



I’m looking for to TextPosition class from org.apache.pdfbox.util
package that has some methods that returns a PDFont class and so on, but I’m
not sure that I’m on the right way.

 

If somebody has any suggestions, I will be really appreciated
to receive the help.

 

Thank you.

Alex