You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amrit Sarkar <sa...@gmail.com> on 2020/10/24 09:36:22 UTC

Difference in scoring for Solr 4.8.x vs 8.6.2 (BoostQuery vs FunctionScoreQuery)

Hi,

We are upgrading a Solr setup from Solr 4.8.x to 8.6.0 and for the
following query, the parsed strings are different. We read through
improvements and I understand Solr 4.8.x had FunctionQuery (BoostQuery)
for default scoring, while Solr 8.x have FunctionScoreQuery.

*Raw Query*

select?df=dummy_title_only& group.limit=1& start=0& q.op=AND&
sort=dummy_in_stock+desc,termfreq(dummy_boosting_keywords,+'bat')+desc,score+desc,termfreq(dummy_popular_keywords,'bat')+desc,dummy_price+desc&
rows=3& group.ngroups=true& version=2.2& tie=1.0&
q=(((bat*+OR+bat)+OR+(id:"bat"+OR+id:"BAT"))+AND+(dummy_category:"DUMMY
_PRODUCT"+OR+dummy_category:"DUMMY
_PRODUCT")^0)+AND+-(dummy_exclusion_keywords:"bat")& facet.limit=10&
defType=edismax&
qf=dummy_title_only^100+dummy_title^70+dummy_attribute^50+dummy_title_semantics^10++dummy_boosting_keywords+dummy_product_noun^100&
group.facet=true& boost=redis(id,dummysearch:_all:core13459)&
timeAllowed=10000& facet.mincount=1& facet=true& wt=json&
group.field=dummy_item_group& group=true& debug=true& indent=true"

*Solr 4.8.x *

ParsedQuery String

"parsedquery_toString": "boost(+(+(+(((dummy_title_only:bat*^100.0 |
dummy_attribute:bat*^50.0 | dummy_title:bat*^70.0 |
dummy_boosting_keywords:bat* | dummy_product_noun:bat*^100.0 |
dummy_title_semantics:bat*^10.0)~1.0 (dummy_title_only:bat^100.0 |
dummy_attribute:bat^50.0 | dummy_title:bat^70.0 |
dummy_boosting_keywords:bat | dummy_product_noun:bat^100.0 |
dummy_title_semantics:bat^10.0)~1.0) (id:bat id:BAT))
+((dummy_category:dummy_product dummy_category:dummy_product)^0.0))
-dummy_exclusion_keywords:bat),redis(id,dummysearch:_all:core13459))",

Scoring for a doc

"137-variantid_122": "\n0.013678619 = (MATCH)
boost(+(+(((ConstantScore(dummy_title_only:bat*^100.0)^100.0 |
ConstantScore(dummy_attribute:bat*^50.0)^50.0 |
ConstantScore(dummy_title:bat*^70.0)^70.0 | ConstantScore() |
ConstantScore()^100.0 |
ConstantScore(dummy_title_semantics:bat*^10.0)^10.0)~1.0
(dummy_title_only:bat^100.0 | dummy_attribute:bat^50.0 |
dummy_title:bat^70.0 | dummy_boosting_keywords:bat |
dummy_product_noun:bat^100.0 | dummy_title_semantics:bat^10.0)~1.0) (id:bat
id:BAT)) +((dummy_category:dummy_product
dummy_category:dummy_product)^0.0))
-dummy_exclusion_keywords:bat,redis(id,dummysearch:_all:core13459)),
product of:\n 0.013678619 = (MATCH) sum of:\n 0.013678619 = (MATCH) sum
of:\n 0.013678619 = (MATCH) product of:\n 0.027357237 = (MATCH) sum of:\n
0.027357237 = (MATCH) product of:\n 0.054714475 = (MATCH) sum of:\n
0.054714475 = (MATCH) max plus 1.0 times others of:\n 0.054714475 = (MATCH)
ConstantScore(dummy_attribute:bat*^50.0)^50.0, product of:\n 50.0 = boost\n
0.0010942895 = queryNorm\n 0.5 = coord(1/2)\n 0.5 = coord(1/2)\n 0.0 =
(MATCH) sum of:\n 0.0 = (MATCH) weight(dummy_category:dummy_product in 38)
[DefaultSimilarity], result of:\n 0.0 = score(doc=38,freq=1.0 =
termFreq=1.0\n), product of:\n 0.0 = queryWeight, product of:\n 1.1147755 =
idf(docFreq=73, maxDocs=83)\n 0.0 = queryNorm\n 1.1147755 = fieldWeight in
38, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n

*Solr 8.6.x *

ParsedQuery String

"parsedquery_toString":
"FunctionScoreQuery(+(+(+((((dummy_title_only:bat*)^100.0 |
(dummy_attribute:bat*)^50.0 | (dummy_title:bat*)^70.0 |
dummy_boosting_keywords:bat* | (dummy_product_noun:bat*)^100.0 |
(dummy_title_semantics:bat*)^10.0)~1.0 ((dummy_title_only:bat)^100.0 |
(dummy_attribute:bat)^50.0 | (dummy_title:bat)^70.0 |
dummy_boosting_keywords:bat | (dummy_product_noun:bat)^100.0 |
(dummy_title_semantics:bat)^10.0)~1.0) (id:bat id:BAT))
+(dummy_category:dummy_product dummy_category:dummy_product)^0.0)
-(+dummy_exclusion_keywords:bat)), scored by
boost(redis(id,dummysearch:_all:core13459)))",

Scoring for a doc

"137-variantid_122": "\n50.0 =
weight(FunctionScoreQuery(+(+((dummy_title_only:bat*)^100.0
(dummy_attribute:bat*)^50.0 (dummy_title:bat*)^70.0
dummy_boosting_keywords:bat* (dummy_product_noun:bat*)^100.0
(dummy_title_semantics:bat*)^10.0 (dummy_title_only:bat)^100.0
(dummy_attribute:bat)^50.0 (dummy_title:bat)^70.0
dummy_boosting_keywords:bat (dummy_product_noun:bat)^100.0
(dummy_title_semantics:bat)^10.0 id:bat id:BAT)
+(ConstantScore(dummy_category:dummy_product))^0.0)
-dummy_exclusion_keywords:bat, scored by
boost(redis(id,dummysearch:_all:core13459)))), result of:\n 50.0 = product
of:\n 50.0 = sum of:\n 50.0 = sum of:\n 50.0 = sum of:\n 50.0 =
dummy_attribute:bat*^50.0\n 0.0 =
ConstantScore(dummy_category:dummy_product)^0.0\n 1.0 =
redis(id,dummysearch:_all:core13459)=1.0\n"
I am trying hard to understand the scoring implementations between these
two major versions (4 versions apart), and it's taking a while for us to
figure out.

Kindly requesting help on this. Anything helps! Explanation of the scoring
or point me to a direction to understand and read about it?

Amrit Sarkar
Engineer | Search and Kubernetes
https://seamadic.com/
Twitter https://twitter.com/sarkaramrit2
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2