You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phillip Farber <pf...@umich.edu> on 2008/04/27 14:33:52 UTC

Queuing adds and commits

A while back Hoss described Solr queuing behavior:

 > searches can go on happily while commits/adds are happening, and
 > multiple adds can happen in parallel, ... but all adds block while a
 > commit is taking place.  i just give all of clients that update the
 > index a really large timeout value (ie: 60 seconds or so) and don't
 > worry about queing up indexing requests.

I am worried about those adds queuing up behind an ongoing commit before 
being processed in parallel.  What if a document is added multiple times 
and each time it is added it has different field values?  You'd want the 
newest, last queued version of that document to win, i.e. to be the 
version that represents the document in the index. But processing in 
parallel suggests that the time order of the adds of that document could 
be lost. Does Solr time stamp the documents in the indexing queue to 
prevent an earlier queued version of the document from being the last 
version indexed?

Thanks,

Phil


Re: Queuing adds and commits

Posted by Chris Hostetter <ho...@fucit.org>.
: A while back Hoss described Solr queuing behavior:
: 
: > searches can go on happily while commits/adds are happening, and
: > multiple adds can happen in parallel, ... but all adds block while a
: > commit is taking place.  i just give all of clients that update the
: > index a really large timeout value (ie: 60 seconds or so) and don't
: > worry about queing up indexing requests.
: 
: I am worried about those adds queuing up behind an ongoing commit before being

queueing isn't really the approriate word.  HTTP requests to add documents 
are handled by threads that your servlet container (ie: Jetty) creates to 
run the appropriate code for dealing with those adds.  that code involves 
making an addDocument() call on the index, which will block when a segment 
merge is happening (during a "commit" or during some adds if they trigger 
a merge).  

my last sentence there ("i just ... don't worry about queing up 
indexing requests.") was in regards to me using long timeouts with retry 
to avoid needing to write any sort of queuing code in the clients that 
send add or commit messages to Solr.

(Note: some of this blocking should go away when the 
ConcurrentMergeSchedule becomes usabel in Solr 1.3)

: processed in parallel.  What if a document is added multiple times and each
: time it is added it has different field values?  You'd want the newest, last
: queued version of that document to win, i.e. to be the version that represents
: the document in the index. But processing in parallel suggests that the time
: order of the adds of that document could be lost. Does Solr time stamp the

if you have two parallel requests coming in at teh same time to add 
two documents with the same uniqueKey field then there is no garuntee 
which one will make it in -- regardless of wether a segment merge or a 
commit is happening.  The JVM decides when to switch between threads, so 
even if a client#1 connects to the server 1 second before client#2 the 
servlet container might run the thread corrisponding with client#2's 
request first, and then client#1's copy of that document will overwrite 
it.



-Hoss


Re: Queuing adds and commits

Posted by James Brady <ja...@gmail.com>.
Depending on your application, it might be useful to take control of  
the queueing yourself: it was for me!

I needed quick turnarounds for submitting a document to be indexed,  
which Solr can't guarantee right now. To address it, I wrote a  
persistent queueing server, accessed by XML-RPC, which has the benefit  
of adding a low-cost layer of indirection between client-side and  
server-side stuff, and properly serialises the order in which events  
arrive.

James


On 27 Apr 2008, at 05:33, Phillip Farber wrote:

>
> A while back Hoss described Solr queuing behavior:
>
> > searches can go on happily while commits/adds are happening, and
> > multiple adds can happen in parallel, ... but all adds block while a
> > commit is taking place.  i just give all of clients that update the
> > index a really large timeout value (ie: 60 seconds or so) and don't
> > worry about queing up indexing requests.
>
> I am worried about those adds queuing up behind an ongoing commit  
> before being processed in parallel.  What if a document is added  
> multiple times and each time it is added it has different field  
> values?  You'd want the newest, last queued version of that document  
> to win, i.e. to be the version that represents the document in the  
> index. But processing in parallel suggests that the time order of  
> the adds of that document could be lost. Does Solr time stamp the  
> documents in the indexing queue to prevent an earlier queued version  
> of the document from being the last version indexed?
>
> Thanks,
>
> Phil
>