You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jason Chaffee <jc...@ebates.com> on 2010/02/25 22:02:30 UTC

How to use dismax and boosting properly?

I am using dismax and I have configured to search 3 different fields
with one field getting an extra boost so that I the results of that
field are at the top of result set.  Then, I sort the results by another
field to get the ordering.

 

My problem is that the scores are being skewed by the adding the scores
from the different fields.  What I really want is to have all matches in
the boost field have an equal score and take precedence over matches
from other fields.  I want them to have the same score so that the
sorting will sort them alphabetically.   Therefore, the scores must be
the same.  Because the query is being found in all three fields with
different number of occurrences some scores are being skewed in the
boosted matches and it is putting them at the top of my results and
alphabetically, they should be near the bottom.

 

Here is an example, in case my explanation isn't clear:

 

I have dismax with the following config:

<str name="qf">Field1^3.0 Field2^0.1 Field3^0.1</str>

<str name="sort">score desc, sortField asc</str>

 

Where sortField is the original keyword token, without any processing
except for lowercase.

 

Field1 (the boosted field)

 	 a

at

at&

at&t

 	 
a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

	


 

 

Field2 

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

 

Field3

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

The user types in the query 'a':  

 

Here is the debugQuery:

 

  <str name="AT&T">

5.4186125 = (MATCH) sum of:

  2.7147598 = (MATCH) max plus 0.1 times others of:

    0.10907243 = (MATCH) weight(Field2:a^0.1 in 80), product of:

      0.01970195 = queryWeight(Field2:a^0.1), product of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      5.5361238 = (MATCH) fieldWeight(Field2:a in 80), product of:

        1.7320508 = tf(termFreq(Field2:a)=3)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=Field2, doc=80)

    2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

      0.7071054 = queryWeight(Field1:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

        1.0 = tf(termFreq(Field1:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field= Field1, doc=80)

  2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

    0.7071054 = queryWeight(Field1:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

      1.0 = tf(termFreq(Field1:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field= Field1, doc=80)

</str>

 

  <str name="Abebooks">

5.4140024 = (MATCH) sum of:

  2.71015 = (MATCH) max plus 0.1 times others of:

    0.062973 = (MATCH) weight(edgeNGramStandardField:a^0.1 in 138),
product of:

      0.01970195 = queryWeight(edgeNGramStandardField:a^0.1), product
of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      3.1962826 = (MATCH) fieldWeight(edgeNGramStandardField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramStandardField:a)=1)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramStandardField, doc=138)

    2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

      0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

  2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

    0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

      1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

</str>

 

As you can see the Boosted field scores are identical, so I would like
those results to come above anything found in the non-boosted fields,
but I do not want the non-boosted fields scores added on. 

 

Is there anything I can do to get what I want?


RE: How to use dismax and boosting properly?

Posted by Jason Chaffee <jc...@ebates.com>.
I thought I tried that, but I guess I didn't restart Solr to pick up the
configuration.  That did the trick.  

Thanks!

-----Original Message-----
From: Nagelberg, Kallin [mailto:KNagelberg@globeandmail.com] 
Sent: Thursday, February 25, 2010 1:10 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: How to use dismax and boosting properly?

Try setting the boost to 0 for the fields you don't want to contribute
to the score.

Kallin Nagelberg

-----Original Message-----
From: Jason Chaffee [mailto:jchaffee@ebates.com] 
Sent: Thursday, February 25, 2010 4:03 PM
To: solr-user@lucene.apache.org
Subject: How to use dismax and boosting properly?

I am using dismax and I have configured to search 3 different fields
with one field getting an extra boost so that I the results of that
field are at the top of result set.  Then, I sort the results by another
field to get the ordering.

 

My problem is that the scores are being skewed by the adding the scores
from the different fields.  What I really want is to have all matches in
the boost field have an equal score and take precedence over matches
from other fields.  I want them to have the same score so that the
sorting will sort them alphabetically.   Therefore, the scores must be
the same.  Because the query is being found in all three fields with
different number of occurrences some scores are being skewed in the
boosted matches and it is putting them at the top of my results and
alphabetically, they should be near the bottom.

 

Here is an example, in case my explanation isn't clear:

 

I have dismax with the following config:

<str name="qf">Field1^3.0 Field2^0.1 Field3^0.1</str>

<str name="sort">score desc, sortField asc</str>

 

Where sortField is the original keyword token, without any processing
except for lowercase.

 

Field1 (the boosted field)

 	 a

at

at&

at&t

 	 
a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

	


 

 

Field2 

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

 

Field3

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

The user types in the query 'a':  

 

Here is the debugQuery:

 

  <str name="AT&T">

5.4186125 = (MATCH) sum of:

  2.7147598 = (MATCH) max plus 0.1 times others of:

    0.10907243 = (MATCH) weight(Field2:a^0.1 in 80), product of:

      0.01970195 = queryWeight(Field2:a^0.1), product of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      5.5361238 = (MATCH) fieldWeight(Field2:a in 80), product of:

        1.7320508 = tf(termFreq(Field2:a)=3)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=Field2, doc=80)

    2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

      0.7071054 = queryWeight(Field1:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

        1.0 = tf(termFreq(Field1:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field= Field1, doc=80)

  2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

    0.7071054 = queryWeight(Field1:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

      1.0 = tf(termFreq(Field1:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field= Field1, doc=80)

</str>

 

  <str name="Abebooks">

5.4140024 = (MATCH) sum of:

  2.71015 = (MATCH) max plus 0.1 times others of:

    0.062973 = (MATCH) weight(edgeNGramStandardField:a^0.1 in 138),
product of:

      0.01970195 = queryWeight(edgeNGramStandardField:a^0.1), product
of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      3.1962826 = (MATCH) fieldWeight(edgeNGramStandardField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramStandardField:a)=1)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramStandardField, doc=138)

    2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

      0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

  2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

    0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

      1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

</str>

 

As you can see the Boosted field scores are identical, so I would like
those results to come above anything found in the non-boosted fields,
but I do not want the non-boosted fields scores added on. 

 

Is there anything I can do to get what I want?


RE: How to use dismax and boosting properly?

Posted by "Nagelberg, Kallin" <KN...@globeandmail.com>.
Try setting the boost to 0 for the fields you don't want to contribute to the score.

Kallin Nagelberg

-----Original Message-----
From: Jason Chaffee [mailto:jchaffee@ebates.com] 
Sent: Thursday, February 25, 2010 4:03 PM
To: solr-user@lucene.apache.org
Subject: How to use dismax and boosting properly?

I am using dismax and I have configured to search 3 different fields
with one field getting an extra boost so that I the results of that
field are at the top of result set.  Then, I sort the results by another
field to get the ordering.

 

My problem is that the scores are being skewed by the adding the scores
from the different fields.  What I really want is to have all matches in
the boost field have an equal score and take precedence over matches
from other fields.  I want them to have the same score so that the
sorting will sort them alphabetically.   Therefore, the scores must be
the same.  Because the query is being found in all three fields with
different number of occurrences some scores are being skewed in the
boosted matches and it is putting them at the top of my results and
alphabetically, they should be near the bottom.

 

Here is an example, in case my explanation isn't clear:

 

I have dismax with the following config:

<str name="qf">Field1^3.0 Field2^0.1 Field3^0.1</str>

<str name="sort">score desc, sortField asc</str>

 

Where sortField is the original keyword token, without any processing
except for lowercase.

 

Field1 (the boosted field)

 	 a

at

at&

at&t

 	 
a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

	


 

 

Field2 

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

 

Field3

a

at

at&

at&t

a

at

att

a

at

at

at &

at &

at & t

 

at&t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

at&t

att

at & t

 

 

abebooks

 

a

ab

abe

abeb

abebo

abeboo

abebook

abebooks

 

The user types in the query 'a':  

 

Here is the debugQuery:

 

  <str name="AT&T">

5.4186125 = (MATCH) sum of:

  2.7147598 = (MATCH) max plus 0.1 times others of:

    0.10907243 = (MATCH) weight(Field2:a^0.1 in 80), product of:

      0.01970195 = queryWeight(Field2:a^0.1), product of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      5.5361238 = (MATCH) fieldWeight(Field2:a in 80), product of:

        1.7320508 = tf(termFreq(Field2:a)=3)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=Field2, doc=80)

    2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

      0.7071054 = queryWeight(Field1:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

        1.0 = tf(termFreq(Field1:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field= Field1, doc=80)

  2.7038527 = (MATCH) weight(Field1:a^3.0 in 80), product of:

    0.7071054 = queryWeight(Field1:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(Field1:a in 80), product of:

      1.0 = tf(termFreq(Field1:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field= Field1, doc=80)

</str>

 

  <str name="Abebooks">

5.4140024 = (MATCH) sum of:

  2.71015 = (MATCH) max plus 0.1 times others of:

    0.062973 = (MATCH) weight(edgeNGramStandardField:a^0.1 in 138),
product of:

      0.01970195 = queryWeight(edgeNGramStandardField:a^0.1), product
of:

        0.1 = boost

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        0.0616402 = queryNorm

      3.1962826 = (MATCH) fieldWeight(edgeNGramStandardField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramStandardField:a)=1)

        3.1962826 = idf(docFreq=117, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramStandardField, doc=138)

    2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

      0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

        3.0 = boost

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        0.0616402 = queryNorm

      3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

        1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

        3.8238325 = idf(docFreq=62, maxDocs=1061)

        1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

  2.7038527 = (MATCH) weight(edgeNGramKeywordField:a^3.0 in 138),
product of:

    0.7071054 = queryWeight(edgeNGramKeywordField:a^3.0), product of:

      3.0 = boost

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      0.0616402 = queryNorm

    3.8238325 = (MATCH) fieldWeight(edgeNGramKeywordField:a in 138),
product of:

      1.0 = tf(termFreq(edgeNGramKeywordField:a)=1)

      3.8238325 = idf(docFreq=62, maxDocs=1061)

      1.0 = fieldNorm(field=edgeNGramKeywordField, doc=138)

</str>

 

As you can see the Boosted field scores are identical, so I would like
those results to come above anything found in the non-boosted fields,
but I do not want the non-boosted fields scores added on. 

 

Is there anything I can do to get what I want?