You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Gitu <gi...@sap.com> on 2010/05/26 07:55:39 UTC

Embedded word document with web page content doesn't get extracted

http://old.nabble.com/file/p28676810/view.xls view.xls 

HI,

I have attached an excel file which has few objects embedded in it. Out of
all, the one with the name 'Retail Plus Chennai'(3rd sheet) doesn't get
extracted. I am using POI 3.5 jars. 

I am unable to identify the problem with this document?

Could you please help clarify this.

Thanks,
Gitu
-- 
View this message in context: http://old.nabble.com/Embedded-word-document-with-web-page-content-doesn%27t-get-extracted-tp28676810p28676810.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Embedded word document with web page content doesn't get extracted

Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 25 May 2010, Gitu wrote:
> I have attached an excel file which has few objects embedded in it.

If you excel file has other office documents embeded in it, then these 
documents are stored seperately and you need to handle each one 
individually

If you're just doing text extraction, take a look at the 
getEmbededDocsTextExtractors method on ExtractorFactory:
http://poi.apache.org/apidocs/org/apache/poi/extractor/ExtractorFactory.html#getEmbededDocsTextExtractors(org.apache.poi.POIOLE2TextExtractor)

You might also find the poifs docs on embeded documents useful:
 	http://poi.apache.org/poifs/embeded.html

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org