You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Chirag Patel <ch...@persistent.co.in> on 2008/01/17 07:30:29 UTC

Urgent Help Required

Hello,

This is Chirag patel.

I want to use lucene.net as a search engine with our web application.

We have extensive search requirements like search on PDF, Doc, HTML etc.

All files will reside at one location in one folder.

How to create INDEX on that folder so I can perform search on all the files
under that folder?

My application will be in Asp.net using C#.

 

Please do the needful. Its very Urgent.

 

Thanks.

 

 



 

Thanks & Regards,

Chirag Patel.

Extn:6110.

 

 


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

RE: Urgent Help Required

Posted by Nic Wise <Ni...@bbc.com>.
I would add:

1. Read the demo's - there is a demo app for creating an index with external docs in there, from memory.

2. Look on codeproject.com for ifilter wrappers, this is a great way to break up office docs, pdfs etc into just the words, which lucene can index. It's not always totally thread-safe, so you may want to put it into a windows service or otherwise serialize it, but it does work. IFilters come with windows (DOC, XLS etc), can be installed seperatly (PDF, ZIP etc), or come with sharepoint/windows sharepoint services (DOCX, XLSX et al).

I've done this as part of Archive Manager (www.quest.com). We indexed all the incoming attachments and messages using Lucene.Net - the largest customer I can recall, and this is 12 months ago and it was growing, had around 20 million emails and maybe 15 million attachments (we single-instance based on an MD5 hash). Performance of the index was outstanding.

It's not as simple as pointing lucene at your folder of documents, but it's not hard, either. If you want the point-and-index, look at the MS index engine, which does that. Of course, it's nowhere near as flexible as lucene, and harder to integrate.....

The book is good tho.

-----Original Message-----
From: Dean Harding [mailto:dean.harding@dload.com.au] 
Sent: 17 January 2008 06:39
To: lucene-net-user@incubator.apache.org
Subject: Re: Urgent Help Required

Chirag Patel wrote:
> Hello,
> 
> This is Chirag patel.
> 
> I want to use lucene.net as a search engine with our web application.
> 
> We have extensive search requirements like search on PDF, Doc, HTML etc.

I suggest you pick up a copy of the "Lucene in Action" book 
(http://www.manning.com/hatcher2/).

It explains everything you need to do whar you want.

Dean. 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

Re: Urgent Help Required

Posted by Dean Harding <de...@dload.com.au>.
Chirag Patel wrote:
> Hello,
> 
> This is Chirag patel.
> 
> I want to use lucene.net as a search engine with our web application.
> 
> We have extensive search requirements like search on PDF, Doc, HTML etc.

I suggest you pick up a copy of the "Lucene in Action" book 
(http://www.manning.com/hatcher2/).

It explains everything you need to do whar you want.

Dean.