You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/07/21 18:14:15 UTC

[jira] Issue Comment Edited: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood

    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676 ] 

Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
--------------------------------------------------------------

Mike, I am afraid that might hurt some people's performance.
I'm a bit concerned my index/queries are maybe abnormal and don't want to break the general case.

I'm not too familiar with trie [what it would do with a really general range query], but a simpler example would be no stopwords, wildcard query of "th?" (matching "the")
maybe it only matches one term, but that term is very common / dense bitset and probably "hot".

In this case the filter would be better, even though its 1 term.

      was (Author: rcmuir):
    Mike, I am afraid that might hurt some people's performance.
I'm a bit concerned my index/queries are maybe abnormal and don't want to break the general case.

I'm not too familiar with trie [what it would do with a really general range query], but a simpler example would be no stopwords, wildcard query of th*
maybe it only matches one term, but that term is very common / dense bitset and probably "hot".

In this case the filter would be better, even though its 1 term.
  
> Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1644.patch
>
>
> When MultiTermQuery is used (via one of its subclasses, eg
> WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> "constant score mode", which pre-builds a filter and then wraps that
> filter as a ConstantScoreQuery.
> If you don't set that, it instead builds a [potentially massive]
> BooleanQuery with one SHOULD clause per term.
> There are some limitations of this approach:
>   * The scores returned by the BooleanQuery are often quite
>     meaningless to the app, so, one should be able to use a
>     BooleanQuery yet get constant scores back.  (Though I vaguely
>     remember at least one example someone raised where the scores were
>     useful...).
>   * The resulting BooleanQuery can easily have too many clauses,
>     throwing an extremely confusing exception to newish users.
>   * It'd be better to have the freedom to pick "build filter up front"
>     vs "build massive BooleanQuery", when constant scoring is enabled,
>     because they have different performance tradeoffs.
>   * In constant score mode, an OpenBitSet is always used, yet for
>     sparse bit sets this does not give good performance.
> I think we could address these issues by giving BooleanQuery a
> constant score mode, then empower MultiTermQuery (when in constant
> score mode) to pick & choose whether to use BooleanQuery vs up-front
> filter, and finally empower MultiTermQuery to pick the best (sparse vs
> dense) bit set impl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Issue Comment Edited: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood

Posted by Robert Muir <rc...@gmail.com>.
Mark I agree, plus multitermquery is pretty complex to benchmark wrt this issue

Because the behavior of someone using it for Trie is really different
than someone like me using it for natural language.
it will probably fit a different distribution (maybe not zipf) and
creating a heuristic to meet everyone's needs seems pretty tricky to
me...

maybe not impossible, I just wanted to raise the question to see ideas
that might prevent possible back-and-forth between
.setConstantScoreRewrite and .setRewriteMethod...

On Tue, Jul 21, 2009 at 12:50 PM, Mark Miller<ma...@gmail.com> wrote:
> It would be great to get some repeatable tests for this type of thing into
> the benchmark contrib. I had started work on that sometime back, but I don't
> think I have it around anymore.
>
> On Tue, Jul 21, 2009 at 12:14 PM, Robert Muir (JIRA) <ji...@apache.org>
> wrote:
>>
>>    [
>> https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676
>> ]
>>
>> Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
>> --------------------------------------------------------------
>>
>> Mike, I am afraid that might hurt some people's performance.
>> I'm a bit concerned my index/queries are maybe abnormal and don't want to
>> break the general case.
>>
>> I'm not too familiar with trie [what it would do with a really general
>> range query], but a simpler example would be no stopwords, wildcard query of
>> "th?" (matching "the")
>> maybe it only matches one term, but that term is very common / dense
>> bitset and probably "hot".
>>
>> In this case the filter would be better, even though its 1 term.
>>
>>      was (Author: rcmuir):
>>    Mike, I am afraid that might hurt some people's performance.
>> I'm a bit concerned my index/queries are maybe abnormal and don't want to
>> break the general case.
>>
>> I'm not too familiar with trie [what it would do with a really general
>> range query], but a simpler example would be no stopwords, wildcard query of
>> th*
>> maybe it only matches one term, but that term is very common / dense
>> bitset and probably "hot".
>>
>> In this case the filter would be better, even though its 1 term.
>>
>> > Enable MultiTermQuery's constant score mode to also use BooleanQuery
>> > under the hood
>> >
>> > -----------------------------------------------------------------------------------
>> >
>> >                 Key: LUCENE-1644
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>> >             Project: Lucene - Java
>> >          Issue Type: Improvement
>> >          Components: Search
>> >            Reporter: Michael McCandless
>> >            Assignee: Michael McCandless
>> >            Priority: Minor
>> >             Fix For: 2.9
>> >
>> >         Attachments: LUCENE-1644.patch
>> >
>> >
>> > When MultiTermQuery is used (via one of its subclasses, eg
>> > WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
>> > "constant score mode", which pre-builds a filter and then wraps that
>> > filter as a ConstantScoreQuery.
>> > If you don't set that, it instead builds a [potentially massive]
>> > BooleanQuery with one SHOULD clause per term.
>> > There are some limitations of this approach:
>> >   * The scores returned by the BooleanQuery are often quite
>> >     meaningless to the app, so, one should be able to use a
>> >     BooleanQuery yet get constant scores back.  (Though I vaguely
>> >     remember at least one example someone raised where the scores were
>> >     useful...).
>> >   * The resulting BooleanQuery can easily have too many clauses,
>> >     throwing an extremely confusing exception to newish users.
>> >   * It'd be better to have the freedom to pick "build filter up front"
>> >     vs "build massive BooleanQuery", when constant scoring is enabled,
>> >     because they have different performance tradeoffs.
>> >   * In constant score mode, an OpenBitSet is always used, yet for
>> >     sparse bit sets this does not give good performance.
>> > I think we could address these issues by giving BooleanQuery a
>> > constant score mode, then empower MultiTermQuery (when in constant
>> > score mode) to pick & choose whether to use BooleanQuery vs up-front
>> > filter, and finally empower MultiTermQuery to pick the best (sparse vs
>> > dense) bit set impl.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
>
> --
> --
> - Mark
>
> http://www.lucidimagination.com
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Issue Comment Edited: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood

Posted by Mark Miller <ma...@gmail.com>.
It would be great to get some repeatable tests for this type of thing into
the benchmark contrib. I had started work on that sometime back, but I don't
think I have it around anymore.

On Tue, Jul 21, 2009 at 12:14 PM, Robert Muir (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676]
>
> Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
> --------------------------------------------------------------
>
> Mike, I am afraid that might hurt some people's performance.
> I'm a bit concerned my index/queries are maybe abnormal and don't want to
> break the general case.
>
> I'm not too familiar with trie [what it would do with a really general
> range query], but a simpler example would be no stopwords, wildcard query of
> "th?" (matching "the")
> maybe it only matches one term, but that term is very common / dense bitset
> and probably "hot".
>
> In this case the filter would be better, even though its 1 term.
>
>      was (Author: rcmuir):
>    Mike, I am afraid that might hurt some people's performance.
> I'm a bit concerned my index/queries are maybe abnormal and don't want to
> break the general case.
>
> I'm not too familiar with trie [what it would do with a really general
> range query], but a simpler example would be no stopwords, wildcard query of
> th*
> maybe it only matches one term, but that term is very common / dense bitset
> and probably "hot".
>
> In this case the filter would be better, even though its 1 term.
>
> > Enable MultiTermQuery's constant score mode to also use BooleanQuery
> under the hood
> >
> -----------------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1644
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Search
> >            Reporter: Michael McCandless
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1644.patch
> >
> >
> > When MultiTermQuery is used (via one of its subclasses, eg
> > WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> > "constant score mode", which pre-builds a filter and then wraps that
> > filter as a ConstantScoreQuery.
> > If you don't set that, it instead builds a [potentially massive]
> > BooleanQuery with one SHOULD clause per term.
> > There are some limitations of this approach:
> >   * The scores returned by the BooleanQuery are often quite
> >     meaningless to the app, so, one should be able to use a
> >     BooleanQuery yet get constant scores back.  (Though I vaguely
> >     remember at least one example someone raised where the scores were
> >     useful...).
> >   * The resulting BooleanQuery can easily have too many clauses,
> >     throwing an extremely confusing exception to newish users.
> >   * It'd be better to have the freedom to pick "build filter up front"
> >     vs "build massive BooleanQuery", when constant scoring is enabled,
> >     because they have different performance tradeoffs.
> >   * In constant score mode, an OpenBitSet is always used, yet for
> >     sparse bit sets this does not give good performance.
> > I think we could address these issues by giving BooleanQuery a
> > constant score mode, then empower MultiTermQuery (when in constant
> > score mode) to pick & choose whether to use BooleanQuery vs up-front
> > filter, and finally empower MultiTermQuery to pick the best (sparse vs
> > dense) bit set impl.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
-- 
- Mark

http://www.lucidimagination.com