You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2016/12/23 15:42:58 UTC
[jira] [Updated] (LUCENE-7055) Better execution path for costly queries

     [ https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-7055:
---------------------------------
    Attachment: LUCENE-7055.patch

I have been looking at slow queries recently, which were slow for that exact reason. They were running a point range query that covered most of the index, intersected with selective queries, which is typically the case when doc values would perform better than points.

So I started exploring how we could improve this and got the following:
 - PointValues get a new method that computes an estimate of the cost of a visitor. For the Lucene70 codec it basically counts the numbers of leaf blocks that intersect the visitor, and multiplies that number by the number of points on leaf blocks.
 - A new API on Weight allows to get an estimate of the cost of a Scorer before building it. The underlying idea is that in the case of a conjunction that contains a range, the range should use points if it has the least cost (ie. it will lead the iteration) and doc values otherwise since the scorer will only be used to validate that the current document matched.

I attached a patch if someone is interested to look into how that works. I tried to make it as little invasive as possible: the new API on Weight is optional, and we do not need to implement giant queries that know both how to use points and doc values, instead there is a wrapper query called IndexOrDocValuesQuery that wraps both a point/index query and a doc values query and it figures out which one to use based on costs. It is neither complete not commitable, just a proof of concept to trigger some discussion.

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly operations on a per-document basis, like phrase queries or doc values queries. But we have another class of costly queries, that return fine iterators, but these iterators are very expensive to build. This is typically the case for queries that leverage DocIdSetBuilder, like TermsQuery, multi-term queries or the new point queries. Intersecting such queries with a selective query is very inefficient since these queries build a doc id set of matching documents for the entire index.
> Is there something we could do to improve the execution path for these queries?
> One idea that comes to mind is that most of these queries could also run on doc values, so maybe we could come up with something that would help decide how to run a query based on other parts of the query? (Just thinking out loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org