You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Suba Suresh <su...@wolfram.com> on 2006/06/27 17:53:37 UTC

hwpf for text extraction

  Hi!

	I just want to extract text from word doc to index with lucene. The 
version I downloaded from apache is year poi-3.0-alpha2005..
Can you tell me where I can get the current stable build from?

thanks,
suba suresh

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: hwpf for text extraction

Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 27 Jun 2006, Suba Suresh wrote:
> 	I just want to extract text from word doc to index with lucene.
> The version I downloaded from apache is year poi-3.0-alpha2005.. Can you
> tell me where I can get the current stable build from?

Your best bet is to use poi-3.0-alpha2 (from your favourite apache
mirror), and then follow the basic text extraction stuff as documented in
	http://jakarta.apache.org/poi/hwpf/quick-guide.html
I use this with my own lucene stuff, and it works fine.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/