You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mailing Lists Account <ml...@imorph.com> on 2002/09/21 14:40:52 UTC

Is Lucene suitable for one-time index and one-time search ?

Hi,

I need to search a bunch of documents.Each document needs to be searched
only once. That means once I build the index and search it, I have no need
for that index and the document again.

The number of documents to be searched in the above process can be very
large. If this process needs to be repeated on a regular basis, was
wondering if I can use Lucene ?

In that case, would creating the index in memory, run the query and
discarding the in-memory index (RamDirectory) be the right approach ?

thanks
Ramesh





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Is Lucene suitable for one-time index and one-time search ?

Posted by Mailing Lists Account <ml...@imorph.com>.
Thanks Doug.
I guess, I will try out lucene and get some performance figures.

Not used regexp much earlier, but I am under the impression that it cannot
do
AND and OR of several terms.

regards
Nethi
----- Original Message -----
From: "Doug Cutting" <cu...@lucene.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Monday, September 23, 2002 10:49 PM
Subject: Re: Is Lucene suitable for one-time index and one-time search ?


> Mailing Lists Account wrote:
> > In effect, what I am trying to do is 'Find in a File(s)' but with one or
> > more terms( containing AND/OR/phrases as operators.)
> >
> > It appears to me that the lucene as all pieces to solve this. That is,
> > extract the terms, index the document and run the query to see if the
> > document is returned in the results.
> >
> > If lucene is an overkill for my kind of app, what are the other
approaces to
> > search a document with a query much similar to what Lucene suppports it.
Can
> > that avoid creating indices ? I really appreciate any pointers.
>
> Lucene might be overkill, but indeed it can solve this.  So if it's not
> way too slow, you might just go ahead and use Lucene, as building this
> from scratch could be a fair amount of work.
>
> However, if Lucene is not giving you adequate performance for this task,
> then you might just try rewriting your queries as regular expressions
> and then the text of each document to see if it matches.  This may be
> difficult if you're depending on Lucene's analysis modules to perform
> stemming, etc.
>
> Doug
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Is Lucene suitable for one-time index and one-time search ?

Posted by Doug Cutting <cu...@lucene.com>.
Mailing Lists Account wrote:
> In effect, what I am trying to do is 'Find in a File(s)' but with one or
> more terms( containing AND/OR/phrases as operators.)
> 
> It appears to me that the lucene as all pieces to solve this. That is,
> extract the terms, index the document and run the query to see if the
> document is returned in the results.
> 
> If lucene is an overkill for my kind of app, what are the other approaces to
> search a document with a query much similar to what Lucene suppports it. Can
> that avoid creating indices ? I really appreciate any pointers.

Lucene might be overkill, but indeed it can solve this.  So if it's not 
way too slow, you might just go ahead and use Lucene, as building this 
from scratch could be a fair amount of work.

However, if Lucene is not giving you adequate performance for this task, 
then you might just try rewriting your queries as regular expressions 
and then the text of each document to see if it matches.  This may be 
difficult if you're depending on Lucene's analysis modules to perform 
stemming, etc.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Is Lucene suitable for one-time index and one-time search ?

Posted by Mailing Lists Account <ml...@imorph.com>.
Thanks Doug for the quick reply.


In the simplest case, I have a document and I need to know if it contains
certain terms or a combination of terms with AND/OR/Phrases etc(Much like
Lucene query. Infact, Lucene queries meet my requirements very well).  In
some cases, I may have more than one document. However, the process is
pretty much same. Unlike other lucene apps, unce I find that these
document(s) either contain or donot contain the terms, I have no use for the
index and hence I need to drop all created indices.  Next time I search, the
documents and the query are totally different.

In effect, what I am trying to do is 'Find in a File(s)' but with one or
more terms( containing AND/OR/phrases as operators.)

It appears to me that the lucene as all pieces to solve this. That is,
extract the terms, index the document and run the query to see if the
document is returned in the results.

If lucene is an overkill for my kind of app, what are the other approaces to
search a document with a query much similar to what Lucene suppports it. Can
that avoid creating indices ? I really appreciate any pointers.

thanks & regards
Ramesh

----- Original Message -----
From: "Doug Cutting" <cu...@lucene.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Sunday, September 22, 2002 2:39 AM
Subject: Re: Is Lucene suitable for one-time index and one-time search ?


> Mailing Lists Account wrote:
> > I need to search a bunch of documents.Each document needs to be searched
> > only once. That means once I build the index and search it, I have no
need
> > for that index and the document again.
>
> This does not sound like the problem that Lucene is designed to solve.
> Lucene can solve it, but not in a very efficient manner.
>
> Will you use the same query repeatedly on different sets of documents?
> If so, then it sounds like you might be doing categorization.  An
> efficient way to do this is to build an index of queries and then search
> it with each new document (instead of searching documents with queries).
>   You might be able to get Lucene to work in that manner, but it is not
> ideally suited.
>
> Doug
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Is Lucene suitable for one-time index and one-time search ?

Posted by Doug Cutting <cu...@lucene.com>.
Mailing Lists Account wrote:
> I need to search a bunch of documents.Each document needs to be searched
> only once. That means once I build the index and search it, I have no need
> for that index and the document again.

This does not sound like the problem that Lucene is designed to solve. 
Lucene can solve it, but not in a very efficient manner.

Will you use the same query repeatedly on different sets of documents? 
If so, then it sounds like you might be doing categorization.  An 
efficient way to do this is to build an index of queries and then search 
it with each new document (instead of searching documents with queries). 
  You might be able to get Lucene to work in that manner, but it is not 
ideally suited.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>