You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@wunderwood.org> on 2014/09/05 01:01:18 UTC

Edismax mm and efficiency

Are there any speed advantages to using “mm”? I can imagine pruning the set of matching documents early, which could help, but is that (or something else) done?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/



Re: Edismax mm and efficiency

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
On Fri, Sep 5, 2014 at 9:34 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

> What would be a high mm value, 75%?


Walter, I suppose that the length of the search result influence the run
time. So, for particular query and  an index, the high mm value is that
one, which significantly reduces the search result length.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Edismax mm and efficiency

Posted by Peter Keegan <pe...@gmail.com>.
Sure. I created SOLR-6502. The tricky part was handling the behavior in a
sharded index. When the index is sharded. the response from each shard will
contain a parameter that indicates if the search results are from the
conjunction of all keywords (mm=100%), or from disjunction (mm=1). If the
shards contain both types, then only return the results from the
conjunction. This is necessary in order to get the same results independent
of the number of shards.

Peter

On Wed, Sep 10, 2014 at 11:07 AM, Walter Underwood <wu...@wunderwood.org>
wrote:

> We do that strict/loose query sequence, but on the client side with two
> requests. Would you consider contributing the QueryComponent?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 10, 2014, at 3:47 AM, Peter Keegan <pe...@gmail.com> wrote:
>
> > I implemented a custom QueryComponent that issues the edismax query with
> > mm=100%, and if no results are found, it reissues the query with mm=1.
> This
> > doubled our query throughput (compared to mm=1 always), as we do some
> > expensive RankQuery processing. For your very long student queries,
> mm=100%
> > would obviously be too high, so you'd have to experiment.
> >
> > On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wu...@wunderwood.org>
> > wrote:
> >
> >> Great!
> >>
> >> We have some very long queries, where students paste entire homework
> >> problems. One of them was 1051 words. Many of them are over 100 words.
> This
> >> could help.
> >>
> >> In the Jira discussion, I saw some comments about handling the most
> sparse
> >> lists first. We did something like that in the Infoseek Ultra engine
> about
> >> twenty years ago. Short termlists (documents matching a term) were
> >> processed first, which kept the in-memory lists of matching docs small.
> It
> >> also allowed early short-circuiting for no-hits queries.
> >>
> >> What would be a high mm value, 75%?
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/
> >>
> >>
> >> On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com>
> >> wrote:
> >>
> >>> indeed https://issues.apache.org/jira/browse/LUCENE-4571
> >>> my feeling is it gives a significant gain in mm high values.
> >>>
> >>>
> >>>
> >>> On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <
> wunder@wunderwood.org>
> >>> wrote:
> >>>
> >>>> Are there any speed advantages to using “mm”? I can imagine pruning
> the
> >>>> set of matching documents early, which could help, but is that (or
> >>>> something else) done?
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> wunder@wunderwood.org
> >>>> http://observer.wunderwood.org/
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>> Principal Engineer,
> >>> Grid Dynamics
> >>>
> >>> <http://www.griddynamics.com>
> >>> <mk...@griddynamics.com>
> >>
> >>
>
>

Re: Edismax mm and efficiency

Posted by Walter Underwood <wu...@wunderwood.org>.
We do that strict/loose query sequence, but on the client side with two requests. Would you consider contributing the QueryComponent?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/


On Sep 10, 2014, at 3:47 AM, Peter Keegan <pe...@gmail.com> wrote:

> I implemented a custom QueryComponent that issues the edismax query with
> mm=100%, and if no results are found, it reissues the query with mm=1. This
> doubled our query throughput (compared to mm=1 always), as we do some
> expensive RankQuery processing. For your very long student queries, mm=100%
> would obviously be too high, so you'd have to experiment.
> 
> On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> Great!
>> 
>> We have some very long queries, where students paste entire homework
>> problems. One of them was 1051 words. Many of them are over 100 words. This
>> could help.
>> 
>> In the Jira discussion, I saw some comments about handling the most sparse
>> lists first. We did something like that in the Infoseek Ultra engine about
>> twenty years ago. Short termlists (documents matching a term) were
>> processed first, which kept the in-memory lists of matching docs small. It
>> also allowed early short-circuiting for no-hits queries.
>> 
>> What would be a high mm value, 75%?
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/
>> 
>> 
>> On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <mk...@griddynamics.com>
>> wrote:
>> 
>>> indeed https://issues.apache.org/jira/browse/LUCENE-4571
>>> my feeling is it gives a significant gain in mm high values.
>>> 
>>> 
>>> 
>>> On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wu...@wunderwood.org>
>>> wrote:
>>> 
>>>> Are there any speed advantages to using “mm”? I can imagine pruning the
>>>> set of matching documents early, which could help, but is that (or
>>>> something else) done?
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>> 
>>> <http://www.griddynamics.com>
>>> <mk...@griddynamics.com>
>> 
>> 


Re: Edismax mm and efficiency

Posted by Peter Keegan <pe...@gmail.com>.
I implemented a custom QueryComponent that issues the edismax query with
mm=100%, and if no results are found, it reissues the query with mm=1. This
doubled our query throughput (compared to mm=1 always), as we do some
expensive RankQuery processing. For your very long student queries, mm=100%
would obviously be too high, so you'd have to experiment.

On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

> Great!
>
> We have some very long queries, where students paste entire homework
> problems. One of them was 1051 words. Many of them are over 100 words. This
> could help.
>
> In the Jira discussion, I saw some comments about handling the most sparse
> lists first. We did something like that in the Infoseek Ultra engine about
> twenty years ago. Short termlists (documents matching a term) were
> processed first, which kept the in-memory lists of matching docs small. It
> also allowed early short-circuiting for no-hits queries.
>
> What would be a high mm value, 75%?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <mk...@griddynamics.com>
> wrote:
>
> > indeed https://issues.apache.org/jira/browse/LUCENE-4571
> > my feeling is it gives a significant gain in mm high values.
> >
> >
> >
> > On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wu...@wunderwood.org>
> > wrote:
> >
> >> Are there any speed advantages to using “mm”? I can imagine pruning the
> >> set of matching documents early, which could help, but is that (or
> >> something else) done?
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/
> >>
> >>
> >>
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mk...@griddynamics.com>
>
>

Re: Edismax mm and efficiency

Posted by Walter Underwood <wu...@wunderwood.org>.
Great!

We have some very long queries, where students paste entire homework problems. One of them was 1051 words. Many of them are over 100 words. This could help.

In the Jira discussion, I saw some comments about handling the most sparse lists first. We did something like that in the Infoseek Ultra engine about twenty years ago. Short termlists (documents matching a term) were processed first, which kept the in-memory lists of matching docs small. It also allowed early short-circuiting for no-hits queries.

What would be a high mm value, 75%?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/


On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <mk...@griddynamics.com> wrote:

> indeed https://issues.apache.org/jira/browse/LUCENE-4571
> my feeling is it gives a significant gain in mm high values.
> 
> 
> 
> On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> Are there any speed advantages to using “mm”? I can imagine pruning the
>> set of matching documents early, which could help, but is that (or
>> something else) done?
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/
>> 
>> 
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>


Re: Edismax mm and efficiency

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
indeed https://issues.apache.org/jira/browse/LUCENE-4571
my feeling is it gives a significant gain in mm high values.



On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wu...@wunderwood.org>
wrote:

> Are there any speed advantages to using “mm”? I can imagine pruning the
> set of matching documents early, which could help, but is that (or
> something else) done?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>