You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greg Pendlebury <gr...@gmail.com> on 2014/07/10 05:39:30 UTC

Phrase Slop relevance tuning

I've received a request from our business area to take a look at
emphasising ~0 phrase matches over ~1 (and greater) more that they are
already. I can't see any doco on the subject, and I'd like to ask if anyone
else has played in this area? Or at least is willing to sanity check my
reasoning before I rush in and code a solution, when I may be reinventing
the wheel?

Looking through the codebase, I can only find hardcoded weightings in a
couple of places, using the formula: "return 1.0f / (distance + 1);" which
results in ~0 getting a weight of 1, and ~1 getting a weight of 0.5.

There are a number of ways I've already considered, but the most flexible
seems to be to expose those two numbers via configuration.

We are considering adjusting them in sync with each other (using 1/3
instead of 1 in both places), which has the impact of altering the overall
distribution of the weightings graph, but retaining the scale between 1 and
0.

Additionally, we are considering increasing the numerator to increase the
upper scale above 1. Not sure if this is dumb idea though. Our hope was to
use something like "return 2.0f / (distance + 0.33f);" to give ~0 matches a
real (^2) boost in comparison to other weighting factors, and retain the ~1
(and greater) matches at around their current weight. This remains a
completely untested theory though, since I may be misunderstanding how the
output gets combined outside this method.

The real technical change though would be to simply get those two numbers
from config. Any advice or suggestions about other ideas we haven't even
considered? The larger picture here is that we are using edismax and the pf
fields are all covered by ps=5.

Ta,
Greg