You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by escher2k <es...@yahoo.com> on 2007/03/28 03:32:18 UTC
Document boost not as expected...
I am implementing a document boost at indexing time for the documents. I read
some posting that
seemed to indicate that omitNorm=false is needed to retain the document
boosting for retrieval.
After I did that, it looks like I am not able to get back the boost I
originally put in. Instead,
I get 1.25 as the score for all the documents retrieved.
Example:
Input
<doc boost="1.33">
<field name="uniq_id">3557_183970_10179</field>
<field name="login_name">user1</field>
<field name="show_all_flag">Y</field>
</doc>
Schema.xml
<fieldtype name="stringB" class="solr.StrField" sortMissingLast="true"
omitNorms="false"/>
<field name="show_all_flag" type="stringB" indexed="true"
stored="true"/>
Output for
(http://testing:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=score,login_name)
<doc>
<float name="score">1.25</float>
<str name="login_name">5webdesign</str>
</doc>
I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
sure how this might have happened - I have modified the custom similarity
but I don't quite have an explanation of how the score changed.
--
View this message in context: http://www.nabble.com/Document-boost-not-as-expected...-tf3476653.html#a9704479
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document boost not as expected...
Posted by escher2k <es...@yahoo.com>.
Thanks for the reply Mike. I think that was what was causing the issue. I
discovered the effect after I
bumped up the numbers a bit. Here's what I see now.
Index time boost My Custom Similarity Default Similarity
Doc 1133226.63 131072 121359
Doc 2 123194.06 114688 106189
The difference between the results is because I am ignoring the length Norm
(changed it
from ((float)(1.0 / Math.sqrt(numTerms) to 1.0f). Thanks once again.
Mike Klaas wrote:
>
> On 3/28/07, escher2k <es...@yahoo.com> wrote:
>>
>> Mike,
>> I am not doing anything custom for this test. I am assuming that the
>> Default Similarity is used.
>> Surprisingly, if I remove the document level boost (set to 1.0) and just
>> have a field level boost, the result
>> seems to be correct.
>
> Another detail that I forgot to mention is that fieldNorms are encoded
> into one-byte floats, so you can experience severe rounding errors.
> The possible values are:
>
> 0 0.0
> 1 5.820766E-10
> 2 6.9849193E-10
> 3 8.1490725E-10
> 4 9.313226E-10
> 5 1.1641532E-9
> 6 1.3969839E-9
> 7 1.6298145E-9
> 8 1.8626451E-9
> 9 2.3283064E-9
> 10 2.7939677E-9
> 11 3.259629E-9
> 12 3.7252903E-9
> 13 4.656613E-9
> 14 5.5879354E-9
> 15 6.519258E-9
> 16 7.4505806E-9
> 17 9.313226E-9
> 18 1.1175871E-8
> 19 1.3038516E-8
> 20 1.4901161E-8
> 21 1.8626451E-8
> 22 2.2351742E-8
> 23 2.6077032E-8
> 24 2.9802322E-8
> 25 3.7252903E-8
> 26 4.4703484E-8
> 27 5.2154064E-8
> 28 5.9604645E-8
> 29 7.4505806E-8
> 30 8.940697E-8
> 31 1.0430813E-7
> 32 1.1920929E-7
> 33 1.4901161E-7
> 34 1.7881393E-7
> 35 2.0861626E-7
> 36 2.3841858E-7
> 37 2.9802322E-7
> 38 3.5762787E-7
> 39 4.172325E-7
> 40 4.7683716E-7
> 41 5.9604645E-7
> 42 7.1525574E-7
> 43 8.34465E-7
> 44 9.536743E-7
> 45 1.1920929E-6
> 46 1.4305115E-6
> 47 1.66893E-6
> 48 1.9073486E-6
> 49 2.3841858E-6
> 50 2.861023E-6
> 51 3.33786E-6
> 52 3.8146973E-6
> 53 4.7683716E-6
> 54 5.722046E-6
> 55 6.67572E-6
> 56 7.6293945E-6
> 57 9.536743E-6
> 58 1.1444092E-5
> 59 1.335144E-5
> 60 1.5258789E-5
> 61 1.9073486E-5
> 62 2.2888184E-5
> 63 2.670288E-5
> 64 3.0517578E-5
> 65 3.8146973E-5
> 66 4.5776367E-5
> 67 5.340576E-5
> 68 6.1035156E-5
> 69 7.6293945E-5
> 70 9.1552734E-5
> 71 1.0681152E-4
> 72 1.2207031E-4
> 73 1.5258789E-4
> 74 1.8310547E-4
> 75 2.1362305E-4
> 76 2.4414062E-4
> 77 3.0517578E-4
> 78 3.6621094E-4
> 79 4.272461E-4
> 80 4.8828125E-4
> 81 6.1035156E-4
> 82 7.324219E-4
> 83 8.544922E-4
> 84 9.765625E-4
> 85 0.0012207031
> 86 0.0014648438
> 87 0.0017089844
> 88 0.001953125
> 89 0.0024414062
> 90 0.0029296875
> 91 0.0034179688
> 92 0.00390625
> 93 0.0048828125
> 94 0.005859375
> 95 0.0068359375
> 96 0.0078125
> 97 0.009765625
> 98 0.01171875
> 99 0.013671875
> 100 0.015625
> 101 0.01953125
> 102 0.0234375
> 103 0.02734375
> 104 0.03125
> 105 0.0390625
> 106 0.046875
> 107 0.0546875
> 108 0.0625
> 109 0.078125
> 110 0.09375
> 111 0.109375
> 112 0.125
> 113 0.15625
> 114 0.1875
> 115 0.21875
> 116 0.25
> 117 0.3125
> 118 0.375
> 119 0.4375
> 120 0.5
> 121 0.625
> 122 0.75
> 123 0.875
> 124 1.0
> 125 1.25
> 126 1.5
> 127 1.75
> 128 2.0
> 129 2.5
> 130 3.0
> 131 3.5
> 132 4.0
> 133 5.0
> 134 6.0
> 135 7.0
> 136 8.0
> 137 10.0
> 138 12.0
> 139 14.0
> 140 16.0
> 141 20.0
> 142 24.0
> 143 28.0
> 144 32.0
> 145 40.0
> 146 48.0
> 147 56.0
> 148 64.0
> 149 80.0
> 150 96.0
> 151 112.0
> 152 128.0
> 153 160.0
> 154 192.0
> 155 224.0
> 156 256.0
> 157 320.0
> 158 384.0
> 159 448.0
> 160 512.0
> 161 640.0
> 162 768.0
> 163 896.0
> 164 1024.0
> 165 1280.0
> 166 1536.0
> 167 1792.0
> 168 2048.0
> 169 2560.0
> 170 3072.0
> 171 3584.0
> 172 4096.0
> 173 5120.0
> 174 6144.0
> 175 7168.0
> 176 8192.0
> 177 10240.0
> 178 12288.0
> 179 14336.0
> 180 16384.0
> 181 20480.0
> 182 24576.0
> 183 28672.0
> 184 32768.0
> 185 40960.0
> 186 49152.0
> 187 57344.0
> 188 65536.0
> 189 81920.0
> 190 98304.0
> 191 114688.0
> 192 131072.0
> 193 163840.0
> 194 196608.0
> 195 229376.0
> 196 262144.0
> 197 327680.0
> 198 393216.0
> 199 458752.0
> 200 524288.0
> 201 655360.0
> 202 786432.0
> 203 917504.0
> 204 1048576.0
> 205 1310720.0
> 206 1572864.0
> 207 1835008.0
> 208 2097152.0
> 209 2621440.0
> 210 3145728.0
> 211 3670016.0
> 212 4194304.0
> 213 5242880.0
> 214 6291456.0
> 215 7340032.0
> 216 8388608.0
> 217 1.048576E7
> 218 1.2582912E7
> 219 1.4680064E7
> 220 1.6777216E7
> 221 2.097152E7
> 222 2.5165824E7
> 223 2.9360128E7
> 224 3.3554432E7
> 225 4.194304E7
> 226 5.0331648E7
> 227 5.8720256E7
> 228 6.7108864E7
> 229 8.388608E7
> 230 1.00663296E8
> 231 1.17440512E8
> 232 1.34217728E8
> 233 1.6777216E8
> 234 2.01326592E8
> 235 2.34881024E8
> 236 2.68435456E8
> 237 3.3554432E8
> 238 4.02653184E8
> 239 4.69762048E8
> 240 5.3687091E8
> 241 6.7108864E8
> 242 8.0530637E8
> 243 9.395241E8
> 244 1.07374182E9
> 245 1.34217728E9
> 246 1.61061274E9
> 247 1.87904819E9
> 248 2.14748365E9
> 249 2.68435456E9
> 250 3.22122547E9
> 251 3.75809638E9
> 252 4.2949673E9
> 253 5.3687091E9
> 254 6.4424509E9
> 255 7.5161928E9
>
>
--
View this message in context: http://www.nabble.com/Document-boost-not-as-expected...-tf3476653.html#a9740182
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document boost not as expected...
Posted by Mike Klaas <mi...@gmail.com>.
On 3/28/07, escher2k <es...@yahoo.com> wrote:
>
> Mike,
> I am not doing anything custom for this test. I am assuming that the
> Default Similarity is used.
> Surprisingly, if I remove the document level boost (set to 1.0) and just
> have a field level boost, the result
> seems to be correct.
Another detail that I forgot to mention is that fieldNorms are encoded
into one-byte floats, so you can experience severe rounding errors.
The possible values are:
0 0.0
1 5.820766E-10
2 6.9849193E-10
3 8.1490725E-10
4 9.313226E-10
5 1.1641532E-9
6 1.3969839E-9
7 1.6298145E-9
8 1.8626451E-9
9 2.3283064E-9
10 2.7939677E-9
11 3.259629E-9
12 3.7252903E-9
13 4.656613E-9
14 5.5879354E-9
15 6.519258E-9
16 7.4505806E-9
17 9.313226E-9
18 1.1175871E-8
19 1.3038516E-8
20 1.4901161E-8
21 1.8626451E-8
22 2.2351742E-8
23 2.6077032E-8
24 2.9802322E-8
25 3.7252903E-8
26 4.4703484E-8
27 5.2154064E-8
28 5.9604645E-8
29 7.4505806E-8
30 8.940697E-8
31 1.0430813E-7
32 1.1920929E-7
33 1.4901161E-7
34 1.7881393E-7
35 2.0861626E-7
36 2.3841858E-7
37 2.9802322E-7
38 3.5762787E-7
39 4.172325E-7
40 4.7683716E-7
41 5.9604645E-7
42 7.1525574E-7
43 8.34465E-7
44 9.536743E-7
45 1.1920929E-6
46 1.4305115E-6
47 1.66893E-6
48 1.9073486E-6
49 2.3841858E-6
50 2.861023E-6
51 3.33786E-6
52 3.8146973E-6
53 4.7683716E-6
54 5.722046E-6
55 6.67572E-6
56 7.6293945E-6
57 9.536743E-6
58 1.1444092E-5
59 1.335144E-5
60 1.5258789E-5
61 1.9073486E-5
62 2.2888184E-5
63 2.670288E-5
64 3.0517578E-5
65 3.8146973E-5
66 4.5776367E-5
67 5.340576E-5
68 6.1035156E-5
69 7.6293945E-5
70 9.1552734E-5
71 1.0681152E-4
72 1.2207031E-4
73 1.5258789E-4
74 1.8310547E-4
75 2.1362305E-4
76 2.4414062E-4
77 3.0517578E-4
78 3.6621094E-4
79 4.272461E-4
80 4.8828125E-4
81 6.1035156E-4
82 7.324219E-4
83 8.544922E-4
84 9.765625E-4
85 0.0012207031
86 0.0014648438
87 0.0017089844
88 0.001953125
89 0.0024414062
90 0.0029296875
91 0.0034179688
92 0.00390625
93 0.0048828125
94 0.005859375
95 0.0068359375
96 0.0078125
97 0.009765625
98 0.01171875
99 0.013671875
100 0.015625
101 0.01953125
102 0.0234375
103 0.02734375
104 0.03125
105 0.0390625
106 0.046875
107 0.0546875
108 0.0625
109 0.078125
110 0.09375
111 0.109375
112 0.125
113 0.15625
114 0.1875
115 0.21875
116 0.25
117 0.3125
118 0.375
119 0.4375
120 0.5
121 0.625
122 0.75
123 0.875
124 1.0
125 1.25
126 1.5
127 1.75
128 2.0
129 2.5
130 3.0
131 3.5
132 4.0
133 5.0
134 6.0
135 7.0
136 8.0
137 10.0
138 12.0
139 14.0
140 16.0
141 20.0
142 24.0
143 28.0
144 32.0
145 40.0
146 48.0
147 56.0
148 64.0
149 80.0
150 96.0
151 112.0
152 128.0
153 160.0
154 192.0
155 224.0
156 256.0
157 320.0
158 384.0
159 448.0
160 512.0
161 640.0
162 768.0
163 896.0
164 1024.0
165 1280.0
166 1536.0
167 1792.0
168 2048.0
169 2560.0
170 3072.0
171 3584.0
172 4096.0
173 5120.0
174 6144.0
175 7168.0
176 8192.0
177 10240.0
178 12288.0
179 14336.0
180 16384.0
181 20480.0
182 24576.0
183 28672.0
184 32768.0
185 40960.0
186 49152.0
187 57344.0
188 65536.0
189 81920.0
190 98304.0
191 114688.0
192 131072.0
193 163840.0
194 196608.0
195 229376.0
196 262144.0
197 327680.0
198 393216.0
199 458752.0
200 524288.0
201 655360.0
202 786432.0
203 917504.0
204 1048576.0
205 1310720.0
206 1572864.0
207 1835008.0
208 2097152.0
209 2621440.0
210 3145728.0
211 3670016.0
212 4194304.0
213 5242880.0
214 6291456.0
215 7340032.0
216 8388608.0
217 1.048576E7
218 1.2582912E7
219 1.4680064E7
220 1.6777216E7
221 2.097152E7
222 2.5165824E7
223 2.9360128E7
224 3.3554432E7
225 4.194304E7
226 5.0331648E7
227 5.8720256E7
228 6.7108864E7
229 8.388608E7
230 1.00663296E8
231 1.17440512E8
232 1.34217728E8
233 1.6777216E8
234 2.01326592E8
235 2.34881024E8
236 2.68435456E8
237 3.3554432E8
238 4.02653184E8
239 4.69762048E8
240 5.3687091E8
241 6.7108864E8
242 8.0530637E8
243 9.395241E8
244 1.07374182E9
245 1.34217728E9
246 1.61061274E9
247 1.87904819E9
248 2.14748365E9
249 2.68435456E9
250 3.22122547E9
251 3.75809638E9
252 4.2949673E9
253 5.3687091E9
254 6.4424509E9
255 7.5161928E9
Re: Document boost not as expected...
Posted by escher2k <es...@yahoo.com>.
Mike,
I am not doing anything custom for this test. I am assuming that the
Default Similarity is used.
Surprisingly, if I remove the document level boost (set to 1.0) and just
have a field level boost, the result
seems to be correct.
Mike Klaas wrote:
>
> On 3/28/07, escher2k <es...@yahoo.com> wrote:
>
>> Again, I fail to understand where it is doing a multiplication by 1.25
>> (score (2.5) = field_boost (2.0) * 1.25 ??).
>
> As I said above, lengthNorm is also multiplied in. This will depend
> on your custom similar what value(s) you have in the field.
>
> -Mike
>
>
--
View this message in context: http://www.nabble.com/Document-boost-not-as-expected...-tf3476653.html#a9722264
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document boost not as expected...
Posted by Mike Klaas <mi...@gmail.com>.
On 3/28/07, escher2k <es...@yahoo.com> wrote:
> Again, I fail to understand where it is doing a multiplication by 1.25
> (score (2.5) = field_boost (2.0) * 1.25 ??).
As I said above, lengthNorm is also multiplied in. This will depend
on your custom similar what value(s) you have in the field.
-Mike
Re: Document boost not as expected...
Posted by escher2k <es...@yahoo.com>.
Chris,
Earlier I was trying to modify the Similarity computation to make it
field dependent (we are trying to change tf based on the field). Now, I have
reverted the custom computation so that the default Similarity is used. Fro
testing, I boosted a single field in one doc.
<doc boost="1.33">
<field name="show_all_flag" boost="2.0">Y</field>
...
</doc>
This is what I see in the explain -
2.5 = (MATCH) sum of:
2.5 = (MATCH) fieldWeight(show_all_flag:Y in 17), product of:
1.0 = tf(termFreq(show_all_flag:Y)=1)
1.0 = idf(docFreq=36239)
2.5 = fieldNorm(field=show_all_flag, doc=17)
Again, I fail to understand where it is doing a multiplication by 1.25
(score (2.5) = field_boost (2.0) * 1.25 ??).
Thanks.
Chris Hostetter wrote:
>
>
> Ditto everything Mike said, but i'm also curious what Similarity changes
> you made ... without knowing what that code looks like, all bets are off
> in terms of anyone being able to help you understand the scores you are
> seeing.
>
> : I am not quite sure how the score changed from 1.33 to 1.25. I am not
> quite
> : sure how this might have happened - I have modified the custom
> similarity
> : but I don't quite have an explanation of how the score changed.
>
>
> -Hoss
>
>
>
--
View this message in context: http://www.nabble.com/Document-boost-not-as-expected...-tf3476653.html#a9718403
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document boost not as expected...
Posted by Chris Hostetter <ho...@fucit.org>.
Ditto everything Mike said, but i'm also curious what Similarity changes
you made ... without knowing what that code looks like, all bets are off
in terms of anyone being able to help you understand the scores you are
seeing.
: I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
: sure how this might have happened - I have modified the custom similarity
: but I don't quite have an explanation of how the score changed.
-Hoss
Re: Document boost not as expected...
Posted by Mike Klaas <mi...@gmail.com>.
On 3/27/07, escher2k <es...@yahoo.com> wrote:
>
> I am implementing a document boost at indexing time for the documents. I read
> some posting that
> seemed to indicate that omitNorm=false is needed to retain the document
> boosting for retrieval.
> After I did that, it looks like I am not able to get back the boost I
> originally put in. Instead,
> I get 1.25 as the score for all the documents retrieved.
>
> Example:
> Input
> <doc boost="1.33">
> <field name="uniq_id">3557_183970_10179</field>
> <field name="login_name">user1</field>
> <field name="show_all_flag">Y</field>
> </doc>
>
> Schema.xml
> <fieldtype name="stringB" class="solr.StrField" sortMissingLast="true"
> omitNorms="false"/>
> <field name="show_all_flag" type="stringB" indexed="true"
> stored="true"/>
>
> Output for
> (http://testing:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=score,login_name)
> <doc>
> <float name="score">1.25</float>
> <str name="login_name">5webdesign</str>
> </doc>
>
> I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
> sure how this might have happened - I have modified the custom similarity
> but I don't quite have an explanation of how the score changed.
Have you looked at the score explanation debug data? The document
boost is incorporated into the fieldNorm and so is modified by the
lengthNorm. Further, during query the term idf, queryNorm come into
play.
You shouldn't expect that the document boost will be returned as the
document score (although you should expect it to affect it).
-Mike