You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul Elschot (JIRA)" <ji...@apache.org> on 2015/03/15 21:35:38 UTC

[jira] [Commented] (LUCENE-6360) TermsQuery should rewrite to a ConstantScoreQuery over a BooleanQuery when there are few terms

    [ https://issues.apache.org/jira/browse/LUCENE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362542#comment-14362542 ] 

Paul Elschot commented on LUCENE-6360:
--------------------------------------

I wonder whether a compressing DocIdSet could also help here.
EliasFanoDocIdSet uses an internal threshold for the number of matching docs, and above that threshold it changes itself to a bitset.
The tradeoff for this is not directly related to skipping because building the set requires all matching docs.
But a small compressing docidset skips/advances faster than a bitset.

Some of this can be estimated in advance by the doc frequencies of the terms involved.

To  figure out the threshold(s), real life test cases would be helpful.
Do you have some in mind already?



> TermsQuery should rewrite to a ConstantScoreQuery over a BooleanQuery when there are few terms
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6360
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6360
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>
> TermsQuery helps when there are lot of terms from which you would like to compute the union, but it is a bit harmful when you have few terms since it cannot really skip: it always consumes all documents matching the underlying terms.
> It would certainly help to rewrite this query to a ConstantScoreQuery over a BooleanQuery when there are few terms in order to have actual skip support.
> As usual the hard part is probably to figure out the threshold. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org