You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jan Høydahl / Cominvent <ja...@cominvent.com> on 2010/09/28 15:32:16 UTC

Indexing and threads

Hi,

How are threads being used when indexing?

Let's say document A and B are ingested in parallell to XMLUpdateRequestHandler in two separate threads.
How far down the chain are the processing of these done in the two separate threads?
Is the full UpdateRequestChain run in the same thread as the incoming request?
Is analysis done in the request thread or in a single indexing thread?
Are ADDs added to the same "commit queue", and then from COMMIT and down to Lucene segment building everything is single-threaded?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Indexing and threads

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Sep 28, 2010 at 9:32 AM, Jan Høydahl / Cominvent
<ja...@cominvent.com> wrote:
> How are threads being used when indexing?
>
> Let's say document A and B are ingested in parallell to XMLUpdateRequestHandler in two separate threads.
> How far down the chain are the processing of these done in the two separate threads?

All the way to the Lucene IndexWriter, where they are also processed
in parallel.

> Is the full UpdateRequestChain run in the same thread as the incoming request?

Yes.

> Is analysis done in the request thread or in a single indexing thread?

Request thread.

> Are ADDs added to the same "commit queue", and then from COMMIT and down to Lucene segment building everything is single-threaded?

While a commit is happening in Solr, adds are blocked.  This is
historical - in the past, Lucene didn't really handle that type of
concurrency for you, and so Solr did.  We need to improve this at some
point...

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Indexing and threads

Posted by Michael McCandless <lu...@mikemccandless.com>.
I can't speak to how Solr handles threads, but in Lucene the two docs
are indexed concurrently.

Internally, Lucene's IndexWriter has separate thread states that hold
the RAM buffer of the inverted docs.  Multiple threads work on
separate thread states concurrently.

The one big exception to this is flushing a new segment, which is
currently single threaded and can be quite a bottleneck (I wrote about
this problem at
http://chbits.blogspot.com/2010/09/lucenes-indexing-is-fast.html).

Mike

On Tue, Sep 28, 2010 at 9:32 AM, Jan Høydahl / Cominvent
<ja...@cominvent.com> wrote:
> Hi,
>
> How are threads being used when indexing?
>
> Let's say document A and B are ingested in parallell to XMLUpdateRequestHandler in two separate threads.
> How far down the chain are the processing of these done in the two separate threads?
> Is the full UpdateRequestChain run in the same thread as the incoming request?
> Is analysis done in the request thread or in a single indexing thread?
> Are ADDs added to the same "commit queue", and then from COMMIT and down to Lucene segment building everything is single-threaded?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org