You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Kaktu Chakarabati <ji...@gmail.com> on 2009/06/05 04:30:10 UTC

MM Parameter and Performance in Solr

Hey guys,
I've been noticing for quite a long time that using minmatch parameter with
a value less than 100%
alongside the dismax qparser seriously degrades performance. My particular
use case involves
using dismax over a set of 4-6 textual fields, about half of which do *not*
filter stop words. ( so yes,
these do involve iterating over large portion of my index in some cases).

This is somewhat understandable as the task of constructing result sets is
no longer simply intersection based,
however I do wonder what work-arounds / standard solutions exist for this
problem and which are applicable
in the solr/lucene environment ( I.e dividing index to 'primary' /
'secondary' sections, using n-gram indices, caching configuration, sharding
might help..? )
I'm working with not such a large corpus (~20 million documents) and the
query processing time is way too long
to my mind ( my goal is 90% percentile QTime to hit around 200ms, I can say
that currently its more than double that.. )
Can anyone please share some of his knowledge? what is practiced i.e in
google, yahoo..? Any plans to address these issue in solr/lucene or am
i just using it wrongly?

Any feedback appreciated..

Thanks,
-Chak

Re: MM Parameter and Performance in Solr

Posted by Chris Hostetter <ho...@fucit.org>.

: Date: Thu, 4 Jun 2009 19:30:10 -0700
: From: Kaktu Chakarabati
: Subject: MM Parameter and Performance in Solr

Kaktu: It doesn't look like you ever got a reply to your question 
(possibly because you sent it to solr-dev, but it's more appropriate for 
solr-user) 

I haven't done any specific performance comparisons of dismax with mm=100% 
vs mm=X%, but in truth that would be an apples to oranges comparison.

the lower the percentage, the more permutations of input terms there are 
that can produce matches, and the more documents that will match -- in 
which case Solr by definition is doing more work.

Asking for a workarround or best practice for dealing with something like 
this is akin to asking for workarrounds for queries that are slow because 
they contain lots of terms and match lots of documents -- there aren't 
really a lot of options, other then preventing your users from executing 
those queries.

The question i would ask in your shoes is wether having the partial 
matching of mm=X% is worth the added search time, or if you'd be happier 
having more exact matching (mm=100%) and faster searches.

: Hey guys,
: I've been noticing for quite a long time that using minmatch parameter with
: a value less than 100%
: alongside the dismax qparser seriously degrades performance. My particular
: use case involves
: using dismax over a set of 4-6 textual fields, about half of which do *not*
: filter stop words. ( so yes,
: these do involve iterating over large portion of my index in some cases).
: 
: This is somewhat understandable as the task of constructing result sets is
: no longer simply intersection based,
: however I do wonder what work-arounds / standard solutions exist for this
: problem and which are applicable
: in the solr/lucene environment ( I.e dividing index to 'primary' /
: 'secondary' sections, using n-gram indices, caching configuration, sharding
: might help..? )
: I'm working with not such a large corpus (~20 million documents) and the
: query processing time is way too long
: to my mind ( my goal is 90% percentile QTime to hit around 200ms, I can say
: that currently its more than double that.. )
: Can anyone please share some of his knowledge? what is practiced i.e in
: google, yahoo..? Any plans to address these issue in solr/lucene or am
: i just using it wrongly?



-Hoss