You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by gloria_white <wh...@gmail.com> on 2006/07/05 23:21:29 UTC

Best way to Add items to Index in Real Time

We have a Lucene index of a small size(about 150k items) that requires
additions/deletions several times in a day. We could add or delete 3 to 4k
documents everytime we perform these operations. 
While we perform this operation, we still need to be 'online' and available
for searching. And if we perform these operations while search is going on,
we get conflict errors, causing either the indexing process or the search
operation to fail. 
What is the best way to handle this scenario so that while we can add/delete
documents from index in real time and also allow search to take place?

thanks a lot! 

Gloria
-- 
View this message in context: http://www.nabble.com/Best-way-to-Add-items-to-Index-in-Real-Time-tf1897254.html#a5189803
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best way to Add items to Index in Real Time

Posted by Chris Hostetter <ho...@fucit.org>.
: I'm a novice with this, so I'll appreciate your patience.
: I'm using a batch program for doing the additions and deletions while a
: separate web app for searching. How can I ensure that these two different
: apps (one making the changes to index and the other just searching) don't
: run into each others' paths? Or am I ignorant of something really basic
: here?

if both of these applications are running on the same machine, then the
built in locking mechanims in the Lucene code base should prevent any
errors from happening unless you circumvent the locks (you would have to
go out of your way to do this).  if the applications are on seperate
machines writing to some shared disk hen that may be the source of your
problem -- the lock files are by default stored in the system tmp
directly, even if you changed the location of the lock files (the mailing
list archive has info on how to do this) there is no garuntee that it will
work because some "remote filesystems" (notably NFS) have issues with
garunteeing a consistent "view" of the index directroy (again, google can
fill you in)

your best bet (assumming i'm right and you are working on two differnet
machines) is to use a copy the index to local disk i na new location
(seperate from the "old" index crrently in use) open a new searcher, and
then swap the refrences.  This is (approximately) the approach taken by
Solr.

If i'm wrong about the multiple machine thing, then i have no idea what's
causing your problem .. cna you post some more details -- preferably
including some stack traces from the various types of crashes (you said
both the indexing app and the searching app occasional crash)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best way to Add items to Index in Real Time

Posted by gloria_white <wh...@gmail.com>.
thanks for the response Otis!
I'm a novice with this, so I'll appreciate your patience. 
I'm using a batch program for doing the additions and deletions while a
separate web app for searching. How can I ensure that these two different
apps (one making the changes to index and the other just searching) don't
run into each others' paths? Or am I ignorant of something really basic
here?

Thanks
-- 
View this message in context: http://www.nabble.com/Best-way-to-Add-items-to-Index-in-Real-Time-tf1897254.html#a5191394
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best way to Add items to Index in Real Time

Posted by Michael Busch <bu...@gmail.com>.
Otis Gospodnetic wrote:
> If you are getting errors while searching and at the same either adding or deleting documents, chances are you are not using the API correctly and following the concurrency rules (described many times on this list).  Yo ucan search and modify your index at the same time.  Adding and deleting documents is best done in batches (e.g. run your deletions first, close IndexReader that did deletions (could be the same one that is used for searching), open IndexWriter, add documents, close writer, re-open IndexSearcher/Reader for searching so your changes are visible).
>
> Otis
>
>   

Hi,

check out this nice patch, it could be useful for your use case:

http://issues.apache.org/jira/browse/LUCENE-565
http://www.gossamer-threads.com/lists/lucene/java-dev/35317

It provides better performance when you have a big number of 
update/delete operations.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best way to Add items to Index in Real Time

Posted by Otis Gospodnetic <ot...@yahoo.com>.
If you are getting errors while searching and at the same either adding or deleting documents, chances are you are not using the API correctly and following the concurrency rules (described many times on this list).  Yo ucan search and modify your index at the same time.  Adding and deleting documents is best done in batches (e.g. run your deletions first, close IndexReader that did deletions (could be the same one that is used for searching), open IndexWriter, add documents, close writer, re-open IndexSearcher/Reader for searching so your changes are visible).

Otis

----- Original Message ----
From: gloria_white <wh...@gmail.com>
To: java-user@lucene.apache.org
Sent: Wednesday, July 5, 2006 5:21:29 PM
Subject: Best way to Add items to Index in Real Time


We have a Lucene index of a small size(about 150k items) that requires
additions/deletions several times in a day. We could add or delete 3 to 4k
documents everytime we perform these operations. 
While we perform this operation, we still need to be 'online' and available
for searching. And if we perform these operations while search is going on,
we get conflict errors, causing either the indexing process or the search
operation to fail. 
What is the best way to handle this scenario so that while we can add/delete
documents from index in real time and also allow search to take place?

thanks a lot! 

Gloria
-- 
View this message in context: http://www.nabble.com/Best-way-to-Add-items-to-Index-in-Real-Time-tf1897254.html#a5189803
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org