You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Dror Matalon <dr...@zapatec.com> on 2003/10/14 18:27:35 UTC

Indexing and searching in parallel

Hi Folks,

Is there a recommended strategy to deal with allowing to search an index
that is updated continuously? 

One idea that I thought of is to have two indexes one for searching and
one for indexing. Periodically I would copy new files from the indexed
index to the search index and then I would stop writing to the original
index and then call optimize on both of them. The main advantage to this
is that I would only be copying the new data over. Since our index files
are over 1 gig, this is important because I wouldn't want to suspend
searches for too long. 

The only problem is that I suspect that while I run the optimize() the
index is unavailable for searches.

Any better strategies for this?

This is all on a Unix system.

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing and searching in parallel

Posted by Dror Matalon <dr...@zapatec.com>.

Hi,

On Tue, Oct 14, 2003 at 06:40:14PM +0200, Giulio Cesare Solaroli wrote:
> Hi.
> 
> We have no problem at all searching while indexing or optimizing.
> 
> The only problem we had was on the IndexReader; the reader has 
> visibility only on the documents available on the index when it is 
> first loaded. Thus you have to define a policy to drop and recreate the 
> reader periodically.
> This because I think (if I have understood it well) Lucene handles the 
> duplication of the index on its own, if there are active readers while 
> it update and index.

Looks like I misunderstood. I looked through the FAQs and didn't find
much information on race condition with searching and indexing and how
lucene locks things. The race condition that I'm talking about is the
fact that if you do a search while you update the index files, you need
to be careful during the transition from the old data to the new one.
I imagine that the process involves renaming files since that's atomic
on most systems, but It'd be nice to have a clearer picture.


Regards,

Dror

> 
> Hope this helps.
> 
> Giulio Cesare Solaroli
> 
> On Tuesday, Oct 14, 2003, at 18:27 Europe/Rome, Dror Matalon wrote:
> 
> >Hi Folks,
> >
> >Is there a recommended strategy to deal with allowing to search an 
> >index
> >that is updated continuously?
> >
> >One idea that I thought of is to have two indexes one for searching and
> >one for indexing. Periodically I would copy new files from the indexed
> >index to the search index and then I would stop writing to the original
> >index and then call optimize on both of them. The main advantage to 
> >this
> >is that I would only be copying the new data over. Since our index 
> >files
> >are over 1 gig, this is important because I wouldn't want to suspend
> >searches for too long.
> >
> >The only problem is that I suspect that while I run the optimize() the
> >index is unavailable for searches.
> >
> >Any better strategies for this?
> >
> >This is all on a Unix system.
> >
> >-- 
> >Dror Matalon
> >Zapatec Inc
> >1700 MLK Way
> >Berkeley, CA 94709
> >http://www.zapatec.com
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing and searching in parallel

Posted by Giulio Cesare Solaroli <sl...@ibn-italy.com>.

Hi.

We have no problem at all searching while indexing or optimizing.

The only problem we had was on the IndexReader; the reader has 
visibility only on the documents available on the index when it is 
first loaded. Thus you have to define a policy to drop and recreate the 
reader periodically.
This because I think (if I have understood it well) Lucene handles the 
duplication of the index on its own, if there are active readers while 
it update and index.

Hope this helps.

Giulio Cesare Solaroli

On Tuesday, Oct 14, 2003, at 18:27 Europe/Rome, Dror Matalon wrote:

> Hi Folks,
>
> Is there a recommended strategy to deal with allowing to search an 
> index
> that is updated continuously?
>
> One idea that I thought of is to have two indexes one for searching and
> one for indexing. Periodically I would copy new files from the indexed
> index to the search index and then I would stop writing to the original
> index and then call optimize on both of them. The main advantage to 
> this
> is that I would only be copying the new data over. Since our index 
> files
> are over 1 gig, this is important because I wouldn't want to suspend
> searches for too long.
>
> The only problem is that I suspect that while I run the optimize() the
> index is unavailable for searches.
>
> Any better strategies for this?
>
> This is all on a Unix system.
>
> -- 
> Dror Matalon
> Zapatec Inc
> 1700 MLK Way
> Berkeley, CA 94709
> http://www.zapatec.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org