You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Stephane James Vaucher <va...@cirano.qc.ca> on 2004/04/16 21:52:47 UTC
POI & Lucene integration
Hi everyone.
I'm planning on using POI to add MSOffice doc support to my app using
Lucene. I know there's been work going on to facilitate the integration.
I've checked-out the latest dist out of cvs, did a grep -i lucene on the
*java files. Found nothing. Is the work available somewhere (or any
interesting references)?
On another note, I'm trying out TextMining, and I'm a bit confused. It
comes distributed with classes in a org.apache.poi package I can't find in
the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically:
org/apache/poi/hwpf/*
What is the relationship between the projects?
Slightly OT, Ryan, in this message:
http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4
you mentioned maybe adding a basic support for powerpoint doc text
extraction. Has anyone looked at this?
cheers,
Stephane Vaucher
CIRANO
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org
Re: POI & Lucene integration
Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Thanks for the info (comments inline),
More generally:
<quote source="http://jakarta.apache.org/poi/">
Lucene for which we'll soon have file format interpretors.
</quote>
I guess I have the latest information on Word support. Any advances for
excel format? Any alpha quality code I could test?
cheers,
sv
On Fri, 16 Apr 2004, Ryan Ackley wrote:
> Stephane,
>
> The textmining.org became sort of a stop gap to support people who wanted to
> extract text from Word docs while I was working on HWPF. However now there
> is a feature in the textmining.org library that I don't plan on adding to
> HWPF and that is support for Word 6.0/95.
Is there a reason for this?
> The post I made to lucene-user about PowerPoint to text was a repost from
> poi-user that someone had posted. I haven' t gotten around to testing it out
> but I have referred several people to it and I haven't heard back from them,
> so I assume it works.
ok, I'll probably add the original post to the lucene wiki so as to not
lose the information.
> The relationship between textmining.org and POI is that I am the principal
> author of HWPF and I am the principal author of the textmining.org
> libraries. I should just donate it to lucene because it is becoming a major
> hassle to maintain. Although I don't know...it has gotten me some side work.
> So I don't know what I plan on doing with it.
Side work is good ;) I know of a few people who happily use the package.
As a future user of your contributions, I'd like to thank you is advance.
> -Ryan
>
> ----- Original Message -----
> From: "Stephane James Vaucher" <va...@cirano.qc.ca>
> To: <po...@jakarta.apache.org>
> Sent: Friday, April 16, 2004 3:52 PM
> Subject: POI & Lucene integration
>
>
> > Hi everyone.
> >
> > I'm planning on using POI to add MSOffice doc support to my app using
> > Lucene. I know there's been work going on to facilitate the integration.
> > I've checked-out the latest dist out of cvs, did a grep -i lucene on the
> > *java files. Found nothing. Is the work available somewhere (or any
> > interesting references)?
> >
> > On another note, I'm trying out TextMining, and I'm a bit confused. It
> > comes distributed with classes in a org.apache.poi package I can't find in
> > the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically:
> > org/apache/poi/hwpf/*
> >
> > What is the relationship between the projects?
> >
> > Slightly OT, Ryan, in this message:
> >
> > http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4
> >
> > you mentioned maybe adding a basic support for powerpoint doc text
> > extraction. Has anyone looked at this?
> >
> > cheers,
> > Stephane Vaucher
> > CIRANO
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org
Re: POI & Lucene integration
Posted by Ryan Ackley <sa...@cfl.rr.com>.
Stephane,
The textmining.org became sort of a stop gap to support people who wanted to
extract text from Word docs while I was working on HWPF. However now there
is a feature in the textmining.org library that I don't plan on adding to
HWPF and that is support for Word 6.0/95.
The post I made to lucene-user about PowerPoint to text was a repost from
poi-user that someone had posted. I haven' t gotten around to testing it out
but I have referred several people to it and I haven't heard back from them,
so I assume it works.
The relationship between textmining.org and POI is that I am the principal
author of HWPF and I am the principal author of the textmining.org
libraries. I should just donate it to lucene because it is becoming a major
hassle to maintain. Although I don't know...it has gotten me some side work.
So I don't know what I plan on doing with it.
-Ryan
----- Original Message -----
From: "Stephane James Vaucher" <va...@cirano.qc.ca>
To: <po...@jakarta.apache.org>
Sent: Friday, April 16, 2004 3:52 PM
Subject: POI & Lucene integration
> Hi everyone.
>
> I'm planning on using POI to add MSOffice doc support to my app using
> Lucene. I know there's been work going on to facilitate the integration.
> I've checked-out the latest dist out of cvs, did a grep -i lucene on the
> *java files. Found nothing. Is the work available somewhere (or any
> interesting references)?
>
> On another note, I'm trying out TextMining, and I'm a bit confused. It
> comes distributed with classes in a org.apache.poi package I can't find in
> the poi dist: poi-bin-2.5-final-20040302.tar.gz, specifically:
> org/apache/poi/hwpf/*
>
> What is the relationship between the projects?
>
> Slightly OT, Ryan, in this message:
>
> http://marc.theaimsgroup.com/?l=lucene-user&m=108030527420219&w=4
>
> you mentioned maybe adding a basic support for powerpoint doc text
> extraction. Has anyone looked at this?
>
> cheers,
> Stephane Vaucher
> CIRANO
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org