You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Antony Bowesman <ad...@teamware.com> on 2009/01/05 03:15:22 UTC

Text extractor meta data

I'm using POI3.5b4 and using ExtractorFactory to get an extractor for various 
types of MS document.  I see the OOXML does not yet support meta data, but for 
the OLE variants I'm having trouble getting the meta data in a simple way.

The only method in the returned POITextExtractor is getText(), which gives a 
line delimeted String of the PID_XXX = value, so I have to parse the strings out 
and match them against the PropertyIDMap names.

Alternatively, I can cast the returned extractor to POIOLE2TextExtractor and 
then get the SI and DSI from there, but I simply then want to get certain 
properties from that.  I don't want to have to write code to do things like 
getAuthor(), as the required properties are driven from external config.

The getProperty() method is protected for some reason, but the getProperties() 
is not.

What's the recommended way to get the properties I want?

Cheers
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Text extractor meta data

Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 5 Jan 2009, Antony Bowesman wrote:
> Alternatively, I can cast the returned extractor to POIOLE2TextExtractor 
> and then get the SI and DSI from there, but I simply then want to get 
> certain properties from that.  I don't want to have to write code to do 
> things like getAuthor(), as the required properties are driven from 
> external config.

Your best bet then is something like 
getSummaryInformation().getProperties(), and find the ones you want in 
there. Well, either that, or do a little bit of reflection, and 
automatically turn the name "author" in your properties file into a call 
to getAuthor() ?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org