You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shikhar Bhushan (JIRA)" <ji...@apache.org> on 2014/11/15 01:18:34 UTC

[jira] [Commented] (LUCENE-5299) Refactor Collector API for parallelism

    [ https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213132#comment-14213132 ] 

Shikhar Bhushan commented on LUCENE-5299:
-----------------------------------------

Slides from my talk at Lucene/Solr Revolution 2014 about this stuff - https://www.dropbox.com/s/h2nqsml0beed0pm/Search-time%20Parallelism.pdf

Some backstory about the recent revival of this issue. The presentation was going to be a failure story since had not seen good performance on our test cluster when I tried it out last year.

However after adding that request-level 'parallelism' throttle and possibly eliminating some bugs in cherry-picking onto latest trunk - seen consistently good results. You can see from the replay graphs towards the end p99 dropping by half, a few hundred ms better for p95, and median looks much improved too. CPU usage was more, as expected, but about similar (I think less, but don't have numbers) than the overhead we saw by sharding and running all the shards on localhost. We are still sharded in this manner so as you can see we considered the latency win to be worth it!

> Refactor Collector API for parallelism
> --------------------------------------
>
>                 Key: LUCENE-5299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Shikhar Bhushan
>         Attachments: LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch, benchmarks.txt
>
>
> h2. Motivation
> We should be able to scale-up better with Solr/Lucene by utilizing multiple CPU cores, and not have to resort to scaling-out by sharding (with all the associated distributed system pitfalls) when the index size does not warrant it.
> Presently, IndexSearcher has an optional constructor arg for an ExecutorService, which gets used for searching in parallel for call paths where one of the TopDocCollector's is created internally. The per-atomic-reader search happens in parallel and then the TopDocs/TopFieldDocs results are merged with locking around the merge bit.
> However there are some problems with this approach:
> * If arbitary Collector args come into play, we can't parallelize. Note that even if ultimately results are going to a TopDocCollector it may be wrapped inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
> * The special-casing with parallelism baked on top does not scale, there are many Collector's that could potentially lend themselves to parallelism, and special-casing means the parallelization has to be re-implemented if a different permutation of collectors is to be used.
> h2. Proposal
> A refactoring of collectors that allows for parallelization at the level of the collection protocol. 
> Some requirements that should guide the implementation:
> * easy migration path for collectors that need to remain serial
> * the parallelization should be composable (when collectors wrap other collectors)
> * allow collectors to pick the optimal solution (e.g. there might be memory tradeoffs to be made) by advising the collector about whether a search will be parallelized, so that the serial use-case is not penalized.
> * encourage use of non-blocking constructs and lock-free parallelism, blocking is not advisable for the hot-spot of a search, besides wasting pooled threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org