You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Naraharasetty Ravi Kumar <rk...@triniti.com> on 2003/02/20 17:37:55 UTC

how to search .pdf, .doc, .jsp files using LUCENE ?

Hi All

I have used the lucene's demo web application. I am developing a website for which I use jsp,servlets,java files. I
need to implement a search engine in my site and for that I am using
LUCENE. I implemented the search using LUCENE and so could search
.html,.txt files but how to search .pdf, .doc, .jsp etc. and use LUCENE in
this context ?

Basically, how I implemented the search on .html, .txt files is, I created the index document using below
command prompt instructions:

C:\DarrenWebsite\Lucene>java
org.apache.lucene.demo.IndexHTML -create -index Index
"..\html"
adding ../html/AboutUs/aboutus.htm
adding ../html/AboutUs/milestones.htm
adding ../html/AboutUs/ourmethodology.htm
adding ../html/AboutUs/ourmission.htm
adding ../html/AboutUs/ourpeople.htm
adding ../html/AboutUs/ourwork.htm
adding ../html/Careers/careerpath.htm
adding ../html/Careers/careers.htm
adding ../html/Careers/opportunities.htm
adding ../html/Clients/clients.htm
adding ../html/ContactUs/contactus.htm
adding ../html/Home/legaldisclaim.htm
adding ../html/Home/sitemap.htm
adding ../html/Images/Menu/menu.htm
adding ../html/Partners/partners.htm
adding ../html/Products/edynamo.htm
adding ../html/Products/packaged_edapters.htm
adding ../html/Products/products.htm
adding ../html/Services/bc.htm
adding ../html/Services/crm.htm
adding ../html/Services/eai.htm
adding ../html/Services/erp.htm
adding ../html/Services/scm.htm
adding ../html/Services/services.htm
adding ../html/a.txt
adding ../html/b.txt
Optimizing index...
2625 total milliseconds




And in my configuration.jsp I have below entry:
String indexLocation = "/DarrenWebsite/Lucene/Index";



That is all I did to implement search on .html, .txt
file and now how do I implement search on .pdf, .doc, .jsp
etc. ??




Thanks & Regards,
Ravi.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org