You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Stephane James Vaucher <va...@cirano.qc.ca> on 2004/04/16 21:52:47 UTC

POI & Lucene integration

Hi everyone.

I'm planning on using POI to add MSOffice doc support to my app using 
Lucene. I know there's been work going on to facilitate the integration. 
I've checked-out the latest dist out of cvs, did a grep -i lucene on the 
*java files. Found nothing. Is the work available somewhere (or any 
interesting references)?

On another note, I'm trying out TextMining, and I'm a bit confused. It 
comes distributed with classes in a org.apache.poi package I can't find in 
the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically: 
org/apache/poi/hwpf/*

What is the relationship between the projects? 

Slightly OT, Ryan, in this message:

http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4

you mentioned maybe adding a basic support for powerpoint doc text 
extraction. Has anyone looked at this?

cheers,
Stephane Vaucher
CIRANO



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: POI & Lucene integration

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Thanks for the info (comments inline),

More generally:

<quote source="http://jakarta.apache.org/poi/">
Lucene for which we'll soon have file format interpretors.
</quote>

I guess I have the latest information on Word support. Any advances for 
excel format? Any alpha quality code I could test?

cheers,
sv

On Fri, 16 Apr 2004, Ryan Ackley wrote:

> Stephane,
> 
> The textmining.org became sort of a stop gap to support people who wanted to
> extract text from Word docs while I was working on HWPF. However now there
> is a feature in the textmining.org library that I don't plan on adding to
> HWPF and that is support for Word 6.0/95.

Is there a reason for this?
 
> The post I made to lucene-user about PowerPoint to text was a repost from
> poi-user that someone had posted. I haven' t gotten around to testing it out
> but I have referred several people to it and I haven't heard back from them,
> so I assume it works.

ok, I'll probably add the original post to the lucene wiki so as to not 
lose the information.
 
> The relationship between textmining.org and POI is that I am the principal
> author of HWPF and I am the principal author of the textmining.org
> libraries. I should just donate it to lucene because it is becoming a major
> hassle to maintain. Although I don't know...it has gotten me some side work.
> So I don't know what I plan on doing with it.

Side work is good ;) I know of a few people who happily use the package. 
As a future user of your contributions, I'd like to thank you is advance.
 
> -Ryan
> 
> ----- Original Message ----- 
> From: "Stephane James Vaucher" <va...@cirano.qc.ca>
> To: <po...@jakarta.apache.org>
> Sent: Friday, April 16, 2004 3:52 PM
> Subject: POI & Lucene integration
> 
> 
> > Hi everyone.
> >
> > I'm planning on using POI to add MSOffice doc support to my app using
> > Lucene. I know there's been work going on to facilitate the integration.
> > I've checked-out the latest dist out of cvs, did a grep -i lucene on the
> > *java files. Found nothing. Is the work available somewhere (or any
> > interesting references)?
> >
> > On another note, I'm trying out TextMining, and I'm a bit confused. It
> > comes distributed with classes in a org.apache.poi package I can't find in
> > the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically:
> > org/apache/poi/hwpf/*
> >
> > What is the relationship between the projects?
> >
> > Slightly OT, Ryan, in this message:
> >
> > http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4
> >
> > you mentioned maybe adding a basic support for powerpoint doc text
> > extraction. Has anyone looked at this?
> >
> > cheers,
> > Stephane Vaucher
> > CIRANO
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: POI & Lucene integration

Posted by Ryan Ackley <sa...@cfl.rr.com>.
Stephane,

The textmining.org became sort of a stop gap to support people who wanted to
extract text from Word docs while I was working on HWPF. However now there
is a feature in the textmining.org library that I don't plan on adding to
HWPF and that is support for Word 6.0/95.

The post I made to lucene-user about PowerPoint to text was a repost from
poi-user that someone had posted. I haven' t gotten around to testing it out
but I have referred several people to it and I haven't heard back from them,
so I assume it works.

The relationship between textmining.org and POI is that I am the principal
author of HWPF and I am the principal author of the textmining.org
libraries. I should just donate it to lucene because it is becoming a major
hassle to maintain. Although I don't know...it has gotten me some side work.
So I don't know what I plan on doing with it.

-Ryan

----- Original Message ----- 
From: "Stephane James Vaucher" <va...@cirano.qc.ca>
To: <po...@jakarta.apache.org>
Sent: Friday, April 16, 2004 3:52 PM
Subject: POI & Lucene integration


> Hi everyone.
>
> I'm planning on using POI to add MSOffice doc support to my app using
> Lucene. I know there's been work going on to facilitate the integration.
> I've checked-out the latest dist out of cvs, did a grep -i lucene on the
> *java files. Found nothing. Is the work available somewhere (or any
> interesting references)?
>
> On another note, I'm trying out TextMining, and I'm a bit confused. It
> comes distributed with classes in a org.apache.poi package I can't find in
> the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically:
> org/apache/poi/hwpf/*
>
> What is the relationship between the projects?
>
> Slightly OT, Ryan, in this message:
>
> http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4
>
> you mentioned maybe adding a basic support for powerpoint doc text
> extraction. Has anyone looked at this?
>
> cheers,
> Stephane Vaucher
> CIRANO
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org