You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Reyhood Farhan <fa...@bioinf.man.ac.uk> on 2004/08/19 11:51:40 UTC

Re: searchhelp

As far as I remember, the pdfbox release includes some existing code to 
index pdfs with lucene, based upon the demo created for lucene 1.3. In 
fact, I  think the code only works for lucene 1,3 - something to do with 
a change from arrays to vectors in lucene 1.4. I may be wrong though. 

http://www.csh.rit.edu/~ben/projects/pdfbox/javadoc/org/pdfbox/searchengine/lucene/package-summary.html


> thanks everybody,
> 
> but i didnt got any code or any real help in this links
> any body has performed previously this search?if yes then please send me the
> code, or tell me the what code I have to add to my present lucene
> ----- Original Message -----
> From: "David Townsend" <da...@magus.co.uk>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 4:17 PM
> Subject: RE: searchhelp
> 
> 
> JGURU FAQ
> http://www.jguru.com/faq/Lucene
> 
> OFFICIAL FAQ
> http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
> 
> MAIL ARCHIVE
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/
> 
> hope this helps.
> 
> 
> -----Original Message-----
> From: Santosh [mailto:santosh.s@softprosys.com]
> Sent: 19 August 2004 11:25
> To: Lucene Users List
> Subject: Re: searchhelp
> 
> 
> I am recently joined into list, I didnt gone through any previous mails, if
> you have any mails or related code please forward it to me
> ----- Original Message -----
> From: "Chandan Tamrakar" <ch...@ccnep.com.np>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 3:47 PM
> Subject: Re: searchhelp
> 
> 
> > For PDF you need to extract a text from pdf files using pdfbox library
> and
> > for word documents u can use apache POI api's . There are messages
> > posted on the  lucene list related to your queries. About database ,i
> guess
> > someone must have done it . :)
> >
> > ----- Original Message -----
> > From: "Santosh" <sa...@softprosys.com>
> > To: <lu...@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 3:58 PM
> > Subject: searchhelp
> >
> >
> > Hi,
> >
> > I am using lucene search engine for my application.
> >
> > i am able to search through the text files and htmls as specified by
> lucene
> >
> > can you please clarify my doubts
> >
> > 1.can lucene search through pdfs and word documents? if yes then how?
> >
> > 2.can lucene search through database ? if yes then how?
> >
> > thankyou
> >
> > santosh
> >
> >
> > -----------------------SOFTPRO DISCLAIMER------------------------------
> >
> > Information contained in this E-MAIL and any attachments are
> > confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
> > and 'confidential'.
> >
> > If you are not an intended or authorised recipient of this E-MAIL or
> > have received it in error, You are notified that any use, copying or
> > dissemination  of the information contained in this E-MAIL in any
> > manner whatsoever is strictly prohibited. Please delete it immediately
> > and notify the sender by E-MAIL.
> >
> > In such a case reading, reproducing, printing or further dissemination
> > of this E-MAIL is strictly prohibited and may be unlawful.
> >
> > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > hereto is free from computer viruses or other defects.
> >
> > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > those of the author and are not necessarily those of SOFTPRO SYSTEMS.
> > ------------------------------------------------------------------------
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searchhelp

Posted by Santosh <sa...@softprosys.com>.
thank you for sending the link.
but my current status is
I have downloaded the war file and built index using IndexHTML file and I am
successfully search through the html and text files now can any body tell
what code should I add to search the pdfs and word docs? and where should I
add?
----- Original Message -----
From: "Reyhood Farhan" <fa...@bioinf.man.ac.uk>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 3:21 PM
Subject: Re: searchhelp


> As far as I remember, the pdfbox release includes some existing code to
> index pdfs with lucene, based upon the demo created for lucene 1.3. In
> fact, I  think the code only works for lucene 1,3 - something to do with
> a change from arrays to vectors in lucene 1.4. I may be wrong though.
>
>
http://www.csh.rit.edu/~ben/projects/pdfbox/javadoc/org/pdfbox/searchengine/
lucene/package-summary.html
>
>
> > thanks everybody,
> >
> > but i didnt got any code or any real help in this links
> > any body has performed previously this search?if yes then please send me
the
> > code, or tell me the what code I have to add to my present lucene
> > ----- Original Message -----
> > From: "David Townsend" <da...@magus.co.uk>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 4:17 PM
> > Subject: RE: searchhelp
> >
> >
> > JGURU FAQ
> > http://www.jguru.com/faq/Lucene
> >
> > OFFICIAL FAQ
> > http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
> >
> > MAIL ARCHIVE
> > http://www.mail-archive.com/lucene-user@jakarta.apache.org/
> >
> > hope this helps.
> >
> >
> > -----Original Message-----
> > From: Santosh [mailto:santosh.s@softprosys.com]
> > Sent: 19 August 2004 11:25
> > To: Lucene Users List
> > Subject: Re: searchhelp
> >
> >
> > I am recently joined into list, I didnt gone through any previous mails,
if
> > you have any mails or related code please forward it to me
> > ----- Original Message -----
> > From: "Chandan Tamrakar" <ch...@ccnep.com.np>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 3:47 PM
> > Subject: Re: searchhelp
> >
> >
> > > For PDF you need to extract a text from pdf files using pdfbox library
> > and
> > > for word documents u can use apache POI api's . There are messages
> > > posted on the  lucene list related to your queries. About database ,i
> > guess
> > > someone must have done it . :)
> > >
> > > ----- Original Message -----
> > > From: "Santosh" <sa...@softprosys.com>
> > > To: <lu...@jakarta.apache.org>
> > > Sent: Thursday, August 19, 2004 3:58 PM
> > > Subject: searchhelp
> > >
> > >
> > > Hi,
> > >
> > > I am using lucene search engine for my application.
> > >
> > > i am able to search through the text files and htmls as specified by
> > lucene
> > >
> > > can you please clarify my doubts
> > >
> > > 1.can lucene search through pdfs and word documents? if yes then how?
> > >
> > > 2.can lucene search through database ? if yes then how?
> > >
> > > thankyou
> > >
> > > santosh
> > >
> > >
> > > -----------------------SOFTPRO
DISCLAIMER------------------------------
> > >
> > > Information contained in this E-MAIL and any attachments are
> > > confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
> > > and 'confidential'.
> > >
> > > If you are not an intended or authorised recipient of this E-MAIL or
> > > have received it in error, You are notified that any use, copying or
> > > dissemination  of the information contained in this E-MAIL in any
> > > manner whatsoever is strictly prohibited. Please delete it immediately
> > > and notify the sender by E-MAIL.
> > >
> > > In such a case reading, reproducing, printing or further dissemination
> > > of this E-MAIL is strictly prohibited and may be unlawful.
> > >
> > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > hereto is free from computer viruses or other defects.
> > >
> > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > those of the author and are not necessarily those of SOFTPRO SYSTEMS.
> >
> ------------------------------------------------------------------------
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org