You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2011/10/10 13:02:35 UTC

Lucene 3.1 search paralelism per segment doubt

I've read in another thread
(http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991)
/Since Lucene 2.9, Lucene works on a per segment basis when searching. Since 
Lucene 3.1 it can even parallelize on multiple segments. If you optimize 
your index you only have one segment/
I'm trying to configure lucene 3.4 to improve my performance as much as
possible and make the maximum CPU usage. As far as I understood, the optimal
scenario would be to have as much threads as segments I have in the index.
The problem here would be that if I rsync the master to the slaves with some
updated documents, these would then have more segments (so there would be
more segments than available threads). 
Another question would be, can I achieve the same search performance with an
index with 5 segments and 5 threads in 3.4 than an optimized index with
compound file using lucene 2.9? (I know the second env mentioned is much
worse loading fieldcaches, etc because is not taking advantage of the
readers per segments)
Can anyone explain me a bit how exactly does it work or point me to some
documentation?



--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-3-1-search-paralelism-per-segment-doubt-tp3409182p3409182.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.1 search paralelism per segment doubt

Posted by Simon Willnauer <si...@googlemail.com>.

On Thu, Oct 27, 2011 at 2:50 PM, Robert Muir <rc...@gmail.com> wrote:
> On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese <ma...@gmail.com> wrote:
>> I've read in another thread
>> (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991)
>> /Since Lucene 2.9, Lucene works on a per segment basis when searching. Since
>> Lucene 3.1 it can even parallelize on multiple segments. If you optimize
>> your index you only have one segment/
>> I'm trying to configure lucene 3.4 to improve my performance as much as
>> possible and make the maximum CPU usage. As far as I understood, the optimal
>> scenario would be to have as much threads as segments I have in the index.
>
> well are you sure this is optimal?
> Using multithreaded search won't actually increase QPS, just make some
> queries run faster when the machine is idle.

I agree with robert, using multiple threads won't make it necessarily
faster. You need to keep in mind that with threads you have
an additional overhead that is not minor. you JVM needs to schedule
threads, switch contexts etc. However when you have like 4 cpus and 8
threads your OS and the JVM will move threads from one cpu to another
including all the instructions and data which might make things even
worse. I don't know what your environment is but in general server
apps you gain concurrency and cpu utilization from incoming requests.
lets say you have 100 concurrent users and 8 segments and 1 thread
each you end up with 800 threads, no good!
>
> If its a busy server with lots of requests the optimal scenario might
> be to not use it at all, because then its just adding overhead.
>
>> Another question would be, can I achieve the same search performance with an
>> index with 5 segments and 5 threads in 3.4 than an optimized index with
>> compound file using lucene 2.9? (I know the second env mentioned is much
>> worse loading fieldcaches, etc because is not taking advantage of the
>> readers per segments)
>
> I would say in general you can without using threads at all.
>
> I think what Uwe was trying to say there is that optimize in general
> is probably just wasteful. For a lot of people its just not going to
> improve the performance of their search, but it can be very expensive
> to do.
>
> My complaint is the naming, i think its the cause of this:
> https://issues.apache.org/jira/browse/LUCENE-3454
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.1 search paralelism per segment doubt

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese <ma...@gmail.com> wrote:
> I've read in another thread
> (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991)
> /Since Lucene 2.9, Lucene works on a per segment basis when searching. Since
> Lucene 3.1 it can even parallelize on multiple segments. If you optimize
> your index you only have one segment/
> I'm trying to configure lucene 3.4 to improve my performance as much as
> possible and make the maximum CPU usage. As far as I understood, the optimal
> scenario would be to have as much threads as segments I have in the index.

well are you sure this is optimal?
Using multithreaded search won't actually increase QPS, just make some
queries run faster when the machine is idle.

If its a busy server with lots of requests the optimal scenario might
be to not use it at all, because then its just adding overhead.

> Another question would be, can I achieve the same search performance with an
> index with 5 segments and 5 threads in 3.4 than an optimized index with
> compound file using lucene 2.9? (I know the second env mentioned is much
> worse loading fieldcaches, etc because is not taking advantage of the
> readers per segments)

I would say in general you can without using threads at all.

I think what Uwe was trying to say there is that optimize in general
is probably just wasteful. For a lot of people its just not going to
improve the performance of their search, but it can be very expensive
to do.

My complaint is the naming, i think its the cause of this:
https://issues.apache.org/jira/browse/LUCENE-3454

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org