You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Gitu <gi...@sap.com> on 2010/05/26 07:55:39 UTC
Embedded word document with web page content doesn't get extracted
http://old.nabble.com/file/p28676810/view.xls view.xls
HI,
I have attached an excel file which has few objects embedded in it. Out of
all, the one with the name 'Retail Plus Chennai'(3rd sheet) doesn't get
extracted. I am using POI 3.5 jars.
I am unable to identify the problem with this document?
Could you please help clarify this.
Thanks,
Gitu
--
View this message in context: http://old.nabble.com/Embedded-word-document-with-web-page-content-doesn%27t-get-extracted-tp28676810p28676810.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Embedded word document with web page content doesn't get
extracted
Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 25 May 2010, Gitu wrote:
> I have attached an excel file which has few objects embedded in it.
If you excel file has other office documents embeded in it, then these
documents are stored seperately and you need to handle each one
individually
If you're just doing text extraction, take a look at the
getEmbededDocsTextExtractors method on ExtractorFactory:
http://poi.apache.org/apidocs/org/apache/poi/extractor/ExtractorFactory.html#getEmbededDocsTextExtractors(org.apache.poi.POIOLE2TextExtractor)
You might also find the poifs docs on embeded documents useful:
http://poi.apache.org/poifs/embeded.html
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org