You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "matija kancijan (JIRA)" <ji...@apache.org> on 2009/06/18 16:43:07 UTC

[jira] Created: (PDFBOX-486) Position of each individual word

Position of each individual word
--------------------------------

                 Key: PDFBOX-486
                 URL: https://issues.apache.org/jira/browse/PDFBOX-486
             Project: PDFBox
          Issue Type: Wish
          Components: Text extraction, Utilities
    Affects Versions: 0.8.0-incubator
            Reporter: matija kancijan


Is it possible to extract possition of each word from te pdf?
Similar to the PDFHighlighter class where output is xml file
with page and possitions of the word.

With this option you cold mark whole article and in addition 
produce your own xml file to select it in pdf file.

When this could be also combined with PDFText2HTML class, 
you would have structure of the original pdf file and possition
of the word, so the selection of articles would be much easier.

This could be useful with bookmarks too.

(I am new to the pdfbox, so if someone can put me in the right 
direction i would gladly do this... ;) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.