You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Stephane James Vaucher <va...@cirano.qc.ca> on 2004/08/02 00:08:39 UTC

RE: Powerpoint search using Lucene

I've seen a post on poi-user list with some more code. The links have been
added to the wiki.

http://wiki.apache.org/jakarta-lucene/PowerPoint

sv

On Thu, 29 Jul 2004, Divya S. Jesuraj wrote:

> The second link - does things a bit differently than one would expect.
>
> It creates multiple files "1.txt", "2.txt", so on, extracts the text and
> keeps it only in "1.txt" and doesn't save the name of the initial powerpoint
> file so it can't link to it when you search for it.
>
> What would be ideal is to extract the powerpoint text into an object
> {String?} and create a Lucene Doc that would add it to the index...
>
> I have been playing with the idea of using the code by Mr.Koundinya and
> somehow storing those contents to a string object which then got added as
> "content" to the Lucene Doc. The file name ( .ppt ) and path would get added
> too...will let you folks know how it goes...
>
> ~Divya
>
> -----Original Message-----
> From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca]
> Sent: Wednesday, July 28, 2004 11:41 PM
> To: Lucene Developers List
> Subject: Re: Powerpoint search using Lucene
>
> I haven't, I've found a few link though...
>
> I just saw this on the poi list. I can't confirm if it works or not (if
> you try it, can you tell us)
>
> http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04782.html
>
> This is a reference to some code that I found works on some ppts:
> http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.o
> rg&msgNo=4326
>
> sv
>
> On Wed, 28 Jul 2004, Divya S. Jesuraj wrote:
>
> > Hello,
> >
> > I am a VERY new Java Programmer and have now been thrust into development
> > using Lucene. I was able to figure out parsing/indexing of MS Word, MS
> > Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF
> > Sandbox. I however haven't been able to do anything with PPTs [or htmls -
> > that is the least of my worries]...
> >
> > I am indexing a directory on my machine and have a user interface with a
> > JSP. Has anyone figured out how to get a Powerpoint search to work? I
> > searched the forums but I can't find anything that would help my
> situation.
> > Some sample code would be appreciated.
> >
> > Thank you.
> >
> > ~Divya Jesuraj
> > Technical Summer Intern 2004
> > MITRE Corporation
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org