You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "matija kancijan (JIRA)" <ji...@apache.org> on 2009/06/18 16:43:07 UTC
[jira] Created: (PDFBOX-486) Position of each individual word
Position of each individual word
--------------------------------
Key: PDFBOX-486
URL: https://issues.apache.org/jira/browse/PDFBOX-486
Project: PDFBox
Issue Type: Wish
Components: Text extraction, Utilities
Affects Versions: 0.8.0-incubator
Reporter: matija kancijan
Is it possible to extract possition of each word from te pdf?
Similar to the PDFHighlighter class where output is xml file
with page and possitions of the word.
With this option you cold mark whole article and in addition
produce your own xml file to select it in pdf file.
When this could be also combined with PDFText2HTML class,
you would have structure of the original pdf file and possition
of the word, so the selection of articles would be much easier.
This could be useful with bookmarks too.
(I am new to the pdfbox, so if someone can put me in the right
direction i would gladly do this... ;) )
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.