You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/10/10 23:53:33 UTC
[jira] [Updated] (PDFBOX-207) Better metadata in conversion to HTML
[ https://issues.apache.org/jira/browse/PDFBOX-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson updated PDFBOX-207:
-------------------------------
Fix Version/s: 1.7.1
> Better metadata in conversion to HTML
> -------------------------------------
>
> Key: PDFBOX-207
> URL: https://issues.apache.org/jira/browse/PDFBOX-207
> Project: PDFBox
> Issue Type: New Feature
> Components: Text extraction
> Priority: Minor
> Fix For: 1.7.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1576966
> Originally submitted by nobody on 2006-10-13 17:18.
> It would be great to have better support for metadata
> in conversion to HTML.
> - Being able to create a HTML page with the proper
> document title in (not one simply guessed from the
> text of the document).
> - Author, keywords, category etc. extracted from the
> document and placed into metafields in the HTML
> - Chosen encoding included in the HTML header.
> I am using PDFbox in conjunction with mnoGoSearch to
> index PDFs on a site. This additional metadata would
> be extremely handy, since it would form a part of the
> indexed details for the documents.
> Even if a simple tool could be created that would
> *just* extract the metadata from a document [into
> some kind of text format], that would be great.
> External tools could then be built around that, e.g.
> a templating tool that could create a final format of
> any form, using the extracted text and the extracted
> metadata.
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO
> BTW I've not used Java before, so don't have any code to
> contribute, but if I do come up with anything, I'll post
> it here.
> -- Jason
> (sorry - mislaid my login too)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)