You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/11/03 20:21:10 UTC
Re: score based on unique words matching
: > q=david bowie changes
: >
: > Problem : If a record mentions david bowie a lot, it beats out something
: > more relevant (more unique matches) ...
: >
: > A. (now appearing david bowie at the cineplex 7pm david bowie goes on stage,
: > then mr. bowie will sign autographs)
: > B. song :david bowie - changes
: >
: > (A) ends up more relevant because of the frequency or number of words in
: > it.. not cool...
: > I want it so the number of words matching will trump density/weight....
debugQuery=true is your freind .. it will show you exactly how the scores
are being computed.
the key factors in something like this are fieldNorm, tf, and the coord
factor.
The fieldNorm includes as a factor the length of the field, so as long as
you have omitNorm=false configured for this field, doc#A should be
panalized relative doc#B for being longer -- but if you omitNorm's then
that won't help you -- so start by checking that.
The coord factor will penalize documents that don't match all of the
clauses of a boolean query (ie: doc #A only matches 2/3 clauses becuase it
doesn't match the word "changes") so you could customize your Similarity
implementation to make that coord penalty higher, but that requires some
custom java code.
As an extreme option, you could use omitTf to completley eliminate the
term frequency from being a factor in scoring so the number of times
"bowie" appears won't affect the score, just that it appears at least
once) but that probably isn't what you want: "david bowie changes
some stuff" would get the same score as "david bowie changes david bowie"
in general the simplest way to deal with a lot of this type of thing is to
think about how you are structuring your query. something as simple as
using the dismax parser with your field in both the "qf" and "pf" fields
(and a little bit of slop in the "ps" param) may give you exactly what you
want (since it will reward docs where the whole query string appears in
the field...
https://wiki.apache.org/solr/DisMaxQParserPlugin
-Hoss