You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/12/30 12:58:46 UTC

[jira] Created: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
-------------------------------------------------------------------------------------------

                 Key: LUCENE-2840
                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
             Project: Lucene - Java
          Issue Type: Sub-task
          Components: Search
            Reporter: Uwe Schindler
            Priority: Minor
             Fix For: 4.0


Spin-off from parent issue:

{quote}
We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976027#action_12976027 ] 

Earwin Burrfoot commented on LUCENE-2840:
-----------------------------------------

I use the following scheme:
* There is a fixed pool of threads shared by all searches, that limits total concurrency.
* Each new search apprehends at most a fixed number of threads from this pool (say, 2-3 of 8 in my setup),
* and these threads churn through segments as through a queue (in maxDoc order, but I think even that is unnecessary).

No special smart binding between threads and segments (eg. 1 thread for each biggie, 1 thread for all of the small ones) -
means simpler code, and zero possibility of stalling, when there are threads to run, segments to search, but binding policy does not connect them.
Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones.

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979276#action_12979276 ] 

Earwin Burrfoot commented on LUCENE-2840:
-----------------------------------------

bq. But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency?
Yes. But that's not my case. And likely not someone else's.

I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler.
While we're at it, we can throw in some sample implementation, which can satisfy some of the users, but not everyone.

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979306#action_12979306 ] 

Earwin Burrfoot commented on LUCENE-2840:
-----------------------------------------

A lot of fork-join type frameworks don't even care. Even though scheduling threads is something people supposedly use them for.
Why? I guess that's due to low yield/cost ratio.
You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees?
I don't want to use allowed concurrency fully. That's not realistic. I want 85% of it. That's already a huge leap ahead of single-threaded searches.


> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979284#action_12979284 ] 

Doron Cohen commented on LUCENE-2840:
-------------------------------------

Is it a possible that with this, searching a large optimized index (single segment) might be slower than searching an un-optimzed index of the same size, since the latter enjoys concurrency? If so, is it too wild for more than one thread to handle that single segment?

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979293#action_12979293 ] 

Michael McCandless commented on LUCENE-2840:
--------------------------------------------

bq. I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler.

I think something like CMS (basically a custom ES w/ proper thread prio/scheduling) will be necessary here.

Until Java can schedule threads the way an OS schedules processes we'll need to emulate it ourselves.

You want long running queries (or, merges) to be gracefully down prioritized so that new/fast queries (merges) finish quickly.

And you want searches (merges) to use the allowed concurrency fully.

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979337#action_12979337 ] 

Michael McCandless commented on LUCENE-2840:
--------------------------------------------

bq. You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees?

Oh we should definitely apply progress not perfection here -- in fact we already are: for starters (today), we bind concurrency to segments (so eg an "optimized" index has no concurrency), and we just use an ES (punt this thread scheduling problem to the caller).  This is better than nothing, but not good enough -- we can do better.

There's another quote that applies here: "big dreams, small steps".  My comment above is "dreaming" but when it comes time to actually get the real work done / making progress towards that dream, of course we take baby steps / progress not perfection.

Design discussions should start w/ the big dreams but then once you've got a rough sense of where you want to get to in the future you shift back to the baby steps you do today, in the direction of that future goal.

Maybe I should wrap my comments in </dream> tags and </babysteps> tags!

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976928#action_12976928 ] 

Michael McCandless commented on LUCENE-2840:
--------------------------------------------

bq. Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones.

But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency?

> Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2840
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Search
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 4.0
>
>
> Spin-off from parent issue:
> {quote}
> We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org