You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shikhar Bhushan (JIRA)" <ji...@apache.org> on 2013/10/21 22:09:43 UTC

[jira] [Comment Edited] (LUCENE-5299) Refactor Collector API for parallelism

    [ https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801019#comment-13801019 ] 

Shikhar Bhushan edited comment on LUCENE-5299 at 10/21/13 8:08 PM:
-------------------------------------------------------------------

I'm planning to work on parallelizing TopFieldCollector in the same way as for TopScoreDocCollector, so the special-casing from IndexSearcher can be removed and searches are parallelizable even if that collector gets wrapped in something else by Solr. 

We are going to be doing some load-tests and latency measurements on one of our experimental clusters using real traffic logs, and I will report those findings. But first need to do that work on TopFieldCollector as most of our requests have multiple sort fields.


was (Author: shikhar):
I'm planning to work on parallelizing TopFieldCollector in the same way as for TopScoreDocCollector, so the special-casing from IndexSearcher can be removed and searches are parallelizable even if that collector gets wrapped in something else by Solr. 

We am going to be doing some load-tests and latency measurements on one of our experimental clusters using real traffic logs, and I will report those findings. But first need to do that work on TopFieldCollector as most of our requests have multiple sort fields.

> Refactor Collector API for parallelism
> --------------------------------------
>
>                 Key: LUCENE-5299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Shikhar Bhushan
>         Attachments: benchmarks.txt, LUCENE-5299.patch
>
>
> h2. Motivation
> We should be able to scale-up better with Solr/Lucene by utilizing multiple CPU cores, and not have to resort to scaling-out by sharding (with all the associated distributed system pitfalls) when the index size does not warrant it.
> Presently, IndexSearcher has an optional constructor arg for an ExecutorService, which gets used for searching in parallel for call paths where one of the TopDocCollector's is created internally. The per-atomic-reader search happens in parallel and then the TopDocs/TopFieldDocs results are merged with locking around the merge bit.
> However there are some problems with this approach:
> * If arbitary Collector args come into play, we can't parallelize. Note that even if ultimately results are going to a TopDocCollector it may be wrapped inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
> * The special-casing with parallelism baked on top does not scale, there are many Collector's that could potentially lend themselves to parallelism, and special-casing means the parallelization has to be re-implemented if a different permutation of collectors is to be used.
> h2. Proposal
> A refactoring of collectors that allows for parallelization at the level of the collection protocol. 
> Some requirements that should guide the implementation:
> * easy migration path for collectors that need to remain serial
> * the parallelization should be composable (when collectors wrap other collectors)
> * allow collectors to pick the optimal solution (e.g. there might be memory tradeoffs to be made) by advising the collector about whether a search will be parallelized, so that the serial use-case is not penalized.
> * encourage use of non-blocking constructs and lock-free parallelism, blocking is not advisable for the hot-spot of a search, besides wasting pooled threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org