You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Tucker <MT...@infoimage.com> on 2002/01/31 01:00:38 UTC

Moving Index from Crawl/Build Server to Search Server

I am working on a browser-based search application that crawls web and file documents.  I would like to do my crawling and index building on one server and my searching on one or more other servers.  I want maximum up time and performance on my search servers.  What is the best way to move the index from the build server to the search servers and then change which index a user is searching against?  I am concerned about switching the index while a user is paging through search results.  Ideally new users will access the new index while current users will access the old index.  How will I know when all current users are no longer accessing the old index so that it can be deleted?

Thanks,

Mark Tucker

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Moving Index from Crawl/Build Server to Search Server

Posted by Ype Kingma <yk...@xs4all.nl>.
Mark,

>I am working on a browser-based search application that crawls web and file documents.  I would like to do my crawling and index building on one server and my searching on one or more other servers.  I want maximum up time and performance on my search servers.  What is the best way to move the index from the build server to the search servers and then change which index a user is searching against?  I am concerned about switching the index while a user is paging through search results.  Ideally new users will access the new index while current users will access the old index.  How will I know when all current users are no longer accessing the old index so that it can be deleted?

Use separate disks for the old and the new index.
Write a class to switch between the two for searching, see
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00133.html
for the requirements lucene poses.
Every once in a while you'll have to check whether the new index is
available, eg by inspecting a file with the preferred index to use.
When it's time to change over, direct all new users of your index reader to the
new index and wait for all users of the old index to finish their work.
Let the administrator know when all old users have finished.

Have a look at Doug Lea's util.concurrent library. The writer preference
reader/writer lock might be handy here with the writer doing the change over.
This effectively waits for all old users to finish. When you want old
and new users working together, you'll need something more complex.

For maximum performance you'll need to limit the number of concurrent
users of the index (indices) anyway.

To move a new index to different search servers use a low priority copy,
and evt. more than one CPU.
Lucene has some facilities to merge indices which allows you to copy
only the newer parts of an index and then merge locally. This does not
delete old lucene documents, though.
Once you've changed over you can also update the old index and use
that for more performance, ie. you can have multiple entries in
the file with the preferred index.

As always, lots of choices. Try the simple ones before buying
a mainframe :)


Have fun,
Ype

-- 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>