You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phillip Farber <pf...@umich.edu> on 2008/04/27 14:33:52 UTC
Queuing adds and commits
A while back Hoss described Solr queuing behavior:
> searches can go on happily while commits/adds are happening, and
> multiple adds can happen in parallel, ... but all adds block while a
> commit is taking place. i just give all of clients that update the
> index a really large timeout value (ie: 60 seconds or so) and don't
> worry about queing up indexing requests.
I am worried about those adds queuing up behind an ongoing commit before
being processed in parallel. What if a document is added multiple times
and each time it is added it has different field values? You'd want the
newest, last queued version of that document to win, i.e. to be the
version that represents the document in the index. But processing in
parallel suggests that the time order of the adds of that document could
be lost. Does Solr time stamp the documents in the indexing queue to
prevent an earlier queued version of the document from being the last
version indexed?
Thanks,
Phil
Re: Queuing adds and commits
Posted by Chris Hostetter <ho...@fucit.org>.
: A while back Hoss described Solr queuing behavior:
:
: > searches can go on happily while commits/adds are happening, and
: > multiple adds can happen in parallel, ... but all adds block while a
: > commit is taking place. i just give all of clients that update the
: > index a really large timeout value (ie: 60 seconds or so) and don't
: > worry about queing up indexing requests.
:
: I am worried about those adds queuing up behind an ongoing commit before being
queueing isn't really the approriate word. HTTP requests to add documents
are handled by threads that your servlet container (ie: Jetty) creates to
run the appropriate code for dealing with those adds. that code involves
making an addDocument() call on the index, which will block when a segment
merge is happening (during a "commit" or during some adds if they trigger
a merge).
my last sentence there ("i just ... don't worry about queing up
indexing requests.") was in regards to me using long timeouts with retry
to avoid needing to write any sort of queuing code in the clients that
send add or commit messages to Solr.
(Note: some of this blocking should go away when the
ConcurrentMergeSchedule becomes usabel in Solr 1.3)
: processed in parallel. What if a document is added multiple times and each
: time it is added it has different field values? You'd want the newest, last
: queued version of that document to win, i.e. to be the version that represents
: the document in the index. But processing in parallel suggests that the time
: order of the adds of that document could be lost. Does Solr time stamp the
if you have two parallel requests coming in at teh same time to add
two documents with the same uniqueKey field then there is no garuntee
which one will make it in -- regardless of wether a segment merge or a
commit is happening. The JVM decides when to switch between threads, so
even if a client#1 connects to the server 1 second before client#2 the
servlet container might run the thread corrisponding with client#2's
request first, and then client#1's copy of that document will overwrite
it.
-Hoss
Re: Queuing adds and commits
Posted by James Brady <ja...@gmail.com>.
Depending on your application, it might be useful to take control of
the queueing yourself: it was for me!
I needed quick turnarounds for submitting a document to be indexed,
which Solr can't guarantee right now. To address it, I wrote a
persistent queueing server, accessed by XML-RPC, which has the benefit
of adding a low-cost layer of indirection between client-side and
server-side stuff, and properly serialises the order in which events
arrive.
James
On 27 Apr 2008, at 05:33, Phillip Farber wrote:
>
> A while back Hoss described Solr queuing behavior:
>
> > searches can go on happily while commits/adds are happening, and
> > multiple adds can happen in parallel, ... but all adds block while a
> > commit is taking place. i just give all of clients that update the
> > index a really large timeout value (ie: 60 seconds or so) and don't
> > worry about queing up indexing requests.
>
> I am worried about those adds queuing up behind an ongoing commit
> before being processed in parallel. What if a document is added
> multiple times and each time it is added it has different field
> values? You'd want the newest, last queued version of that document
> to win, i.e. to be the version that represents the document in the
> index. But processing in parallel suggests that the time order of
> the adds of that document could be lost. Does Solr time stamp the
> documents in the indexing queue to prevent an earlier queued version
> of the document from being the last version indexed?
>
> Thanks,
>
> Phil
>