You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Antony Bowesman <ad...@teamware.com> on 2009/01/05 03:15:22 UTC
Text extractor meta data
I'm using POI3.5b4 and using ExtractorFactory to get an extractor for various
types of MS document. I see the OOXML does not yet support meta data, but for
the OLE variants I'm having trouble getting the meta data in a simple way.
The only method in the returned POITextExtractor is getText(), which gives a
line delimeted String of the PID_XXX = value, so I have to parse the strings out
and match them against the PropertyIDMap names.
Alternatively, I can cast the returned extractor to POIOLE2TextExtractor and
then get the SI and DSI from there, but I simply then want to get certain
properties from that. I don't want to have to write code to do things like
getAuthor(), as the required properties are driven from external config.
The getProperty() method is protected for some reason, but the getProperties()
is not.
What's the recommended way to get the properties I want?
Cheers
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
Re: Text extractor meta data
Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 5 Jan 2009, Antony Bowesman wrote:
> Alternatively, I can cast the returned extractor to POIOLE2TextExtractor
> and then get the SI and DSI from there, but I simply then want to get
> certain properties from that. I don't want to have to write code to do
> things like getAuthor(), as the required properties are driven from
> external config.
Your best bet then is something like
getSummaryInformation().getProperties(), and find the ones you want in
there. Well, either that, or do a little bit of reflection, and
automatically turn the name "author" in your properties file into a call
to getAuthor() ?
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org