You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eustache Felenc <eu...@idilia.com> on 2013/03/19 15:16:28 UTC

Boolean Query Scorer Over-weighting Query Terms With Synonyms

Hi,

I don't understand why the scorer is making a sum of the weight of the 
OR clauses. It seems to me that it is unbalancing the query scoring 
toward the term that has more alternatives. To me it would make more 
sense to have the max of the weight of query term alternatives.

Here is an example:
I ran in the solr admin interface: gucci (handbag OR purse OR pocketbook)
By clicking debug I can see that the parsed query is as expected: 
"parsedquery":"text:gucci (text:handbag text:purse text:pocketbook)"
The explain field shows that the scorer is making (I simplify a bit 
here): weight(gucci) + sum( weight(handbag) + weight(purse) + 
weight(pocketbook))
The consequence is that a result containing handbag, purse and 
pocketbook is going to have a higher score than a result containing 
gucci and handbag. I think this is counter-intuitive. To me the OR means 
those terms are equivalent, not that they are more important. Besides I 
could use query term boosting to do this independently.

I experimented with Edismax and it has similar behaviour.

The question are, am I missing something ? Is there a way to have an OR 
clause which preserve query term relative "importance" (note that 
playing with mm in edismax does not solve the issue) ?

Thanks !