You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Ingram <an...@tangentlabs.co.uk> on 2011/11/25 17:08:09 UTC

Boosted documents not appearing higher than less-boosted ones for equal relevancy.

Hi all,

I have 4 products, let's call them p1,p2, p3 and p4, at the point of indexing I'm boosting each document as follows (using <doc boost="foo">):

p1 = 2.3434156476491901
p2 = 2.1894875146124502
p3 = 2.51677824126855
p4 = 2.2773491010634999

(Note: scores may not be identical to what it currently indexed, because I can't figure out how to get this information from Solr, these values are simply illustrating what is being fed into the index)

When I'm performing a search query, they are all being given an equal score of 23.54723 for one example case (see debugQuery details below). As far as I an tell the boost I'm provided isn't contributing to the score, but across my overall index the boosting is successfully promoting more popular products over less popular ones (the boost is calculated based on a number of factors such as popularity).

So my question is, why are these 4 products all being given the same score, is the document boosting not being considered correctly?

Additionally I'm sorting by "can_purchase+desc,+score+desc", where can_purchase is a boolean field.

I would greatly appreciate any help with this.

Regards,
Andrew Ingram

> <lst name="debug">
> <str name="rawquerystring">(text:jeffrey AND text:archer)</str>
> <str name="querystring">(text:jeffrey AND text:archer)</str>
> <str name="parsedquery">+(text:JFR text:jeffrey) +(text:ARXR text:archer)</str>
> <str name="parsedquery_toString">+(text:JFR text:jeffrey) +(text:ARXR text:archer)</str>
> <lst name="explain">
> 
> ... (other results) ...
> 
> <str name="catalogue.product.2848634">
> 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH) weight(text:JFR in 1494239), product of: 0.42661786 = queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH) fieldWeight(text:JFR in 1494239), product of: 1.0 = tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1494239) 5.3501997 = (MATCH) weight(text:jeffrey in 1494239), product of: 0.47666705 = queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413, maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH) fieldWeight(text:jeffrey in 1494239), product of: 1.0 = tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1494239) 13.91137 = (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1494239), product of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 = idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 = (MATCH) fieldWeight(text:ARXR in 1494239), product of: 1.0 = tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1494239) 7.4245367 = (MATCH) weight(text:archer in 1494239), product of: 0.56151944 = queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915, maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH) fieldWeight(text:archer in 1494239), product of: 1.0 = tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1494239)
> </str>
> <str name="catalogue.product.2920808">
> 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH) weight(text:JFR in 1526040), product of: 0.42661786 = queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH) fieldWeight(text:JFR in 1526040), product of: 1.0 = tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1526040) 5.3501997 = (MATCH) weight(text:jeffrey in 1526040), product of: 0.47666705 = queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413, maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH) fieldWeight(text:jeffrey in 1526040), product of: 1.0 = tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1526040) 13.91137 = (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1526040), product of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 = idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 = (MATCH) fieldWeight(text:ARXR in 1526040), product of: 1.0 = tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1526040) 7.4245367 = (MATCH) weight(text:archer in 1526040), product of: 0.56151944 = queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915, maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH) fieldWeight(text:archer in 1526040), product of: 1.0 = tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1526040)
> </str>
> <str name="catalogue.product.3002864">
> 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH) weight(text:JFR in 1562638), product of: 0.42661786 = queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH) fieldWeight(text:JFR in 1562638), product of: 1.0 = tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1562638) 5.3501997 = (MATCH) weight(text:jeffrey in 1562638), product of: 0.47666705 = queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413, maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH) fieldWeight(text:jeffrey in 1562638), product of: 1.0 = tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1562638) 13.91137 = (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1562638), product of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 = idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 = (MATCH) fieldWeight(text:ARXR in 1562638), product of: 1.0 = tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1562638) 7.4245367 = (MATCH) weight(text:archer in 1562638), product of: 0.56151944 = queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915, maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH) fieldWeight(text:archer in 1562638), product of: 1.0 = tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1562638)
> </str>
> <str name="catalogue.product.2229176">
> 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH) weight(text:JFR in 59760), product of: 0.42661786 = queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH) fieldWeight(text:JFR in 59760), product of: 1.0 = tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760) 5.3501997 = (MATCH) weight(text:jeffrey in 59760), product of: 0.47666705 = queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413, maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH) fieldWeight(text:jeffrey in 59760), product of: 1.0 = tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760) 13.91137 = (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 59760), product of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 = idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 = (MATCH) fieldWeight(text:ARXR in 59760), product of: 1.0 = tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760) 7.4245367 = (MATCH) weight(text:archer in 59760), product of: 0.56151944 = queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915, maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH) fieldWeight(text:archer in 59760), product of: 1.0 = tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760)
> </str>


Re: Boosted documents not appearing higher than less-boosted ones for equal relevancy.

Posted by Chris Hostetter <ho...@fucit.org>.
: I don't think there is a way of seeing the "boosts" from the index, as
: those are encoded as "norms" (together with length normalization). You can
: see the norms with Luke if you want to and in the debugQuery output the
	...
: http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/search/Similarity.html
: for
: Lucene/Solr 3.1) you can see how the norm is calculated, In your debugQuery
: I can see that all the "fieldNorm" are 1.5 and I'm not so sure of why that
: can happen to you.

I haven't done the math to check, but this is most likely because of the 
precision loss involved when encoding the norm values as a single "byte" 
-- boost values that are small and very close together aren't going to 
provide much differentiation.

Bottom Line: if you want fine control over boosting with some per doc 
numeric value, add that value as a field, and then either: factor it 
directly into the score with a boosting query, or add it as a sort option 
(ie: if you just want to break ties when scores are identical do 
"sort=my_boolean+desc,score+desc,my_boost_field+desc")



-Hoss

Re: Boosted documents not appearing higher than less-boosted ones for equal relevancy.

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I don't think there is a way of seeing the "boosts" from the index, as
those are encoded as "norms" (together with length normalization). You can
see the norms with Luke if you want to and in the debugQuery output the
index-time boost should be represented  in the "fieldNorm" section. (if you
click in "view source" you'll see the explain section of the debugQuery
indented, much more easy to read).

In the Similarity javadoc (
http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/search/Similarity.html
for
Lucene/Solr 3.1) you can see how the norm is calculated, In your debugQuery
I can see that all the "fieldNorm" are 1.5 and I'm not so sure of why that
can happen to you.

On Fri, Nov 25, 2011 at 1:08 PM, Andrew Ingram <
andrew.ingram@tangentlabs.co.uk> wrote:

> Hi all,
>
> I have 4 products, let's call them p1,p2, p3 and p4, at the point of
> indexing I'm boosting each document as follows (using <doc boost="foo">):
>
> p1 = 2.3434156476491901
> p2 = 2.1894875146124502
> p3 = 2.51677824126855
> p4 = 2.2773491010634999
>
> (Note: scores may not be identical to what it currently indexed, because I
> can't figure out how to get this information from Solr, these values are
> simply illustrating what is being fed into the index)
>
> When I'm performing a search query, they are all being given an equal
> score of 23.54723 for one example case (see debugQuery details below). As
> far as I an tell the boost I'm provided isn't contributing to the score,
> but across my overall index the boosting is successfully promoting more
> popular products over less popular ones (the boost is calculated based on a
> number of factors such as popularity).
>
> So my question is, why are these 4 products all being given the same
> score, is the document boosting not being considered correctly?
>
> Additionally I'm sorting by "can_purchase+desc,+score+desc", where
> can_purchase is a boolean field.
>
> I would greatly appreciate any help with this.
>
> Regards,
> Andrew Ingram
>
> > <lst name="debug">
> > <str name="rawquerystring">(text:jeffrey AND text:archer)</str>
> > <str name="querystring">(text:jeffrey AND text:archer)</str>
> > <str name="parsedquery">+(text:JFR text:jeffrey) +(text:ARXR
> text:archer)</str>
> > <str name="parsedquery_toString">+(text:JFR text:jeffrey) +(text:ARXR
> text:archer)</str>
> > <lst name="explain">
> >
> > ... (other results) ...
> >
> > <str name="catalogue.product.2848634">
> > 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH)
> weight(text:JFR in 1494239), product of: 0.42661786 =
> queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173,
> maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH)
> fieldWeight(text:JFR in 1494239), product of: 1.0 =
> tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1494239) 5.3501997 = (MATCH)
> weight(text:jeffrey in 1494239), product of: 0.47666705 =
> queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH)
> fieldWeight(text:jeffrey in 1494239), product of: 1.0 =
> tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1494239) 13.91137 =
> (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1494239), product
> of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 =
> idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 =
> (MATCH) fieldWeight(text:ARXR in 1494239), product of: 1.0 =
> tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1494239) 7.4245367 = (MATCH)
> weight(text:archer in 1494239), product of: 0.56151944 =
> queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915,
> maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH)
> fieldWeight(text:archer in 1494239), product of: 1.0 =
> tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1494239)
> > </str>
> > <str name="catalogue.product.2920808">
> > 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH)
> weight(text:JFR in 1526040), product of: 0.42661786 =
> queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173,
> maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH)
> fieldWeight(text:JFR in 1526040), product of: 1.0 =
> tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1526040) 5.3501997 = (MATCH)
> weight(text:jeffrey in 1526040), product of: 0.47666705 =
> queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH)
> fieldWeight(text:jeffrey in 1526040), product of: 1.0 =
> tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1526040) 13.91137 =
> (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1526040), product
> of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 =
> idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 =
> (MATCH) fieldWeight(text:ARXR in 1526040), product of: 1.0 =
> tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1526040) 7.4245367 = (MATCH)
> weight(text:archer in 1526040), product of: 0.56151944 =
> queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915,
> maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH)
> fieldWeight(text:archer in 1526040), product of: 1.0 =
> tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1526040)
> > </str>
> > <str name="catalogue.product.3002864">
> > 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH)
> weight(text:JFR in 1562638), product of: 0.42661786 =
> queryWeight(text:JFR), product of: 6.6971116 = idf(docFreq=49173,
> maxDocs=14654117) 0.06370177 = queryNorm 10.045668 = (MATCH)
> fieldWeight(text:JFR in 1562638), product of: 1.0 =
> tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1562638) 5.3501997 = (MATCH)
> weight(text:jeffrey in 1562638), product of: 0.47666705 =
> queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH)
> fieldWeight(text:jeffrey in 1562638), product of: 1.0 =
> tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=1562638) 13.91137 =
> (MATCH) sum of: 6.4868336 = (MATCH) weight(text:ARXR in 1562638), product
> of: 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 =
> idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 =
> (MATCH) fieldWeight(text:ARXR in 1562638), product of: 1.0 =
> tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1562638) 7.4245367 = (MATCH)
> weight(text:archer in 1562638), product of: 0.56151944 =
> queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915,
> maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH)
> fieldWeight(text:archer in 1562638), product of: 1.0 =
> tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=1562638)
> > </str>
> > <str name="catalogue.product.2229176">
> > 23.54723 = (MATCH) sum of: 9.63586 = (MATCH) sum of: 4.285661 = (MATCH)
> weight(text:JFR in 59760), product of: 0.42661786 = queryWeight(text:JFR),
> product of: 6.6971116 = idf(docFreq=49173, maxDocs=14654117) 0.06370177 =
> queryNorm 10.045668 = (MATCH) fieldWeight(text:JFR in 59760), product of:
> 1.0 = tf(termFreq(text:JFR)=1) 6.6971116 = idf(docFreq=49173,
> maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760) 5.3501997 =
> (MATCH) weight(text:jeffrey in 59760), product of: 0.47666705 =
> queryWeight(text:jeffrey), product of: 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 0.06370177 = queryNorm 11.224186 = (MATCH)
> fieldWeight(text:jeffrey in 59760), product of: 1.0 =
> tf(termFreq(text:jeffrey)=1) 7.482791 = idf(docFreq=22413,
> maxDocs=14654117) 1.5 = fieldNorm(field=text, doc=59760) 13.91137 = (MATCH)
> sum of: 6.4868336 = (MATCH) weight(text:ARXR in 59760), product of:
> 0.52486366 = queryWeight(text:ARXR), product of: 8.239388 =
> idf(docFreq=10517, maxDocs=14654117) 0.06370177 = queryNorm 12.359083 =
> (MATCH) fieldWeight(text:ARXR in 59760), product of: 1.0 =
> tf(termFreq(text:ARXR)=1) 8.239388 = idf(docFreq=10517, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=59760) 7.4245367 = (MATCH)
> weight(text:archer in 59760), product of: 0.56151944 =
> queryWeight(text:archer), product of: 8.814816 = idf(docFreq=5915,
> maxDocs=14654117) 0.06370177 = queryNorm 13.222225 = (MATCH)
> fieldWeight(text:archer in 59760), product of: 1.0 =
> tf(termFreq(text:archer)=1) 8.814816 = idf(docFreq=5915, maxDocs=14654117)
> 1.5 = fieldNorm(field=text, doc=59760)
> > </str>
>
>