You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2019/02/01 16:12:00 UTC
[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

    [ https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758451#comment-16758451 ] 

Michael McCandless commented on LUCENE-8675:
--------------------------------------------

I think it'd be interesting to explore intra-segment parallelism, but I agree w/ [~jpountz] that there are challenges :)

If you pass an {{ExecutorService}} to {{IndexSearcher}} today you can already use multiple threads to answer one query, but the concurrency is tied to your segment geometry and annoyingly a supposedly "optimized" index gets no concurrency ;)

But if you do have many segments, this can give a nice reduction to query latencies when QPS is well below the searcher's red-line capacity (probably at the expense of some hopefully small loss of red-line throughput because of the added overhead of thread scheduling).  For certain use cases (large index, low typical query rate) this is a powerful approach.

It's true that one can also divide an index into more shards and run each shard concurrently but then you are also multiplying the fixed query setup cost which in some cases can be relatively significant.
{quote}Parallelizing based on ranges of doc IDs is problematic for some queries, for instance the cost of evaluating a range query over an entire segment or only about a specific range of doc IDs is exactly the same given that it uses data-structures that are organized by value rather than by doc ID.
{quote}
Yeah that's a real problem – these queries traverse the BKD tree per-segment while creating the scorer, which is/can be the costly part, and then produce a bit set which is very fast to iterate over.  This phase is not separately visible to the caller, unlike e.g. rewrite that MultiTermQueries use to translate into simpler queries, so it'd be tricky to build intra-segment concurrency on top ...

> Divide Segment Search Amongst Multiple Threads
> ----------------------------------------------
>
>                 Key: LUCENE-8675
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8675
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Atri Sharma
>            Priority: Major
>
> Segment search is a single threaded operation today, which can be a bottleneck for large analytical queries which index a lot of data and have complex queries which touch multiple segments (imagine a composite query with range query and filters on top). This ticket is for discussing the idea of splitting a single segment into multiple threads based on mutually exclusive document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning all matching documents (collectors not terminating early). The second phase patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org