You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2006/12/20 19:23:12 UTC

Re: Rebuilding index on a regular basis

Why not switch where the searchers look rather than copy the index and
restart? That is, your searcher is pointing at index1, and you build the new
one in a a new dir (index2). On some signal, your server closes the searcher
pointing to index1 and opens one pointing to index2 and uses that until
tomorrow, when you do the opposite.

You could even warm up the searcher after you open it but before you start
searching with it if you wanted.

Or, if you are using Linux, say, your index directory could be a symlink and
your process would be
1> build/test the new index
2> shut down the server
3> switch the symlink to point at the new index directory
4> start the server.

You'd still have a small interruption for your users, but we're probably
talking 2 seconds plus however long it takes you to stop/start your
server.....

Erick


On 12/20/06, Scott Sellman <ss...@valueclick.com> wrote:
>
> Note: I have changed the title of this thread to match its content
>
> I am currently facing a similar issue.  I am dealing with a large index
> that is constantly used and needs to be updated on a daily basis.  For
> fear of corruption I would rather rebuild the index each time,
> performing tests against it before using it.  However the problem I am
> having is switching in the old index without causing service
> interruption.  As long as queries are being made against the index I am
> running into locking issues with the index files, preventing me from
> putting the new index in place. Any suggestions?
>
> Thanks,
> Scott
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, December 20, 2006 7:59 AM
> To: java-user@lucene.apache.org
> Subject: Re: MultiFieldQueryParser doesn't properly filter out documents
> when the query string specifies to exclude certain terms
>
> My first question is how many documents would you be deleting on a pass
> for
> option 2? If it's 10 documents out of 10,000, I'd consider just deleting
> them and re-adding (see IndexModifier).
>
> Personally, if posible, I prefer your first option, building a
> completely
> new index and switching between them. This is especially useful if
> something
> catastrophic happens to the index as you build it and it winds up being
> unusable (power failures *do* happen). You can keep using your old index
> and
> be happy.
>
> Another question is how quickly the index builds and how soon do your
> users
> require that they get up-to-date data?
>
> And remember that no matter what, you must re-open your searcher to see
> the
> updates.
>
> I'd be really reluctant to remove all the items and re-build the index
> for
> several reasons...
> 1> You wouldn't get the new data being added until you closed/reopened
> your
> searcher.
> 2> The documents you deleted wouldn't be "gone" until you
> closed/reopened
> your searcher.
> 3> In the interim, your users wouldn't have access to much of
> anything....
>
> Best
> Erick
>
> On 12/20/06, Adam Fleming <af...@hotmail.com> wrote:
> >
> >
> > Hello Gentlemen (+Ladies?),
> >
> > I'm integrating Lucene into a Spring web-app, and have found a
> plethora of
> > great web + print resources to make the integration quick and
> seamless.  One
> > thing that I have been hard-pressed to find is a good solution for
> > rebuilding the index on a regular basis.
> >
> > I'm curious if a you know of a best-practice (or have found something
> > personally that works) for rebuilding a Lucene Index w/o service
> > interruptions.  The assumptions are a spring IOC container w/ an
> > IndexFactory bean.  I have the project configured to work with both
> > FSDirectory and RamDirectory implementations.   If you don't know
> Spring,
> > you are free to ignore the details - I'll adapt your comments to my
> code :)
> >
> > So far I tried rebuilding the index on a regular schedule, but
> foolishly
> > only added duplicate documents to an existing index.
> >
> > Things I have considered are
> > - Using two index directories, and rebuilding one while the other is
> >    in use + switching when the rebuilt index is ready.  This would
> >    cause the app to alternate between two indexes.
> > - Using a single index, and iterating over the index entirely,
> >    deleting documents 1 by 1 and re-adding them with fresh data
> > - Using a single index, and deleting ALL the documents at once
> >    and then adding them all back as quickly as possible.
> >
> >
> > All of my proposed ideas seem fly in the face of Lucene's sipmlicity,
> and
> > I will be so thankful to be pointed in the right direction.
> >
> >
> > Happy Holidays and  a big Thank You to the active list users,
> >
> >
> > Adam Fleming
> >
> > _________________________________________________________________
> > Try amazing new 3D maps
> > http://maps.live.com/?wip=51
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>