You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by prabin meitei <pr...@gmail.com> on 2009/12/18 18:40:38 UTC

document with different index time boost returns same score

Hi,
   I have an index in which documents are inserted with different boost
during indexing.
eg.
doc1 has boost 5.64
doc2 has boost 5.25
doc3 has boost 5.10
doc4 has boost 4.8
doc5 has boost 4.4
doc6 has boost 4.0
and so on... some documents even having boost only 1.0

when i search the index for a term which occur only once in these documents
i expected that the final scores of the search hit will differ according to
the document boost set during indexing.
But surprisingly i found that doc1, doc2 and doc3 have same score (even the
raw score) and doc4 and doc5 have same score.

I even tried changing index time boost to 564, 525, 510 .. and so on for
doc1, doc2 doc3 and so on... but it still returned same result.

Can anyone explain what is happening? why the scores are same even when the
index time boost was different. Is there any other way to implement my
expected result.
any help will be highly appreciated.

Prabin

Re: document with different index time boost returns same score

Posted by prabin meitei <pr...@gmail.com>.
Thanks to all for the replies. I checked with luke and documents with
different index time boosting (not much different) has same fieldNorm. I
think that is causing the search hits to have same score.
As Andrzej suggested i checked the rounding error caused by encoding. The
result really surprises me.
like 7.9 rounding to 7.0, 8.0 rounding to 8.0 and even 9.4 rounding to 8.0
etc..

Prabin


On Sat, Dec 19, 2009 at 3:09 AM, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2009-12-18 21:47, Tom Hill wrote:
>
>> The docBoost, IIRC, is stored in a single byte, which combines the doc
>> boost, the field boost, and the length norm.
>> (
>>
>> http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
>> )
>>
>> Are the lengths of your documents the same? If not, this could be
>> affecting
>> your scoring.
>>
>> You can run luke (http://code.google.com/p/luke/) , and look at the
>> values
>> for fieldNorm. It's on the documents tab.
>>
>
> There's a little widget in Luke to set the value of norm - just display the
> dialog, it shows you the rounding error that this encoding causes (and what
> input values effectively come out the same, once encoded).
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: document with different index time boost returns same score

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2009-12-18 21:47, Tom Hill wrote:
> The docBoost, IIRC, is stored in a single byte, which combines the doc
> boost, the field boost, and the length norm.
> (
> http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
> )
>
> Are the lengths of your documents the same? If not, this could be affecting
> your scoring.
>
> You can run luke (http://code.google.com/p/luke/) , and look at the values
> for fieldNorm. It's on the documents tab.

There's a little widget in Luke to set the value of norm - just display 
the dialog, it shows you the rounding error that this encoding causes 
(and what input values effectively come out the same, once encoded).


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: document with different index time boost returns same score

Posted by Tom Hill <so...@worldware.com>.
The docBoost, IIRC, is stored in a single byte, which combines the doc
boost, the field boost, and the length norm.
(
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
)

Are the lengths of your documents the same? If not, this could be affecting
your scoring.

You can run luke (http://code.google.com/p/luke/) , and look at the values
for fieldNorm. It's on the documents tab.

Does the ordering look like it is based on these numbers? Then length
difference are probably what's happening to you.

Tom




On Fri, Dec 18, 2009 at 10:26 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> The boost is stored in the index using a single byte, ie, heavily
> quantized... that may explain what you are seeing?
>
> If you make the boosts wildly different do you then see a score difference?
>
> Mike
>
> On Fri, Dec 18, 2009 at 12:40 PM, prabin meitei <pr...@gmail.com>
> wrote:
> > Hi,
> >   I have an index in which documents are inserted with different boost
> > during indexing.
> > eg.
> > doc1 has boost 5.64
> > doc2 has boost 5.25
> > doc3 has boost 5.10
> > doc4 has boost 4.8
> > doc5 has boost 4.4
> > doc6 has boost 4.0
> > and so on... some documents even having boost only 1.0
> >
> > when i search the index for a term which occur only once in these
> documents
> > i expected that the final scores of the search hit will differ according
> to
> > the document boost set during indexing.
> > But surprisingly i found that doc1, doc2 and doc3 have same score (even
> the
> > raw score) and doc4 and doc5 have same score.
> >
> > I even tried changing index time boost to 564, 525, 510 .. and so on for
> > doc1, doc2 doc3 and so on... but it still returned same result.
> >
> > Can anyone explain what is happening? why the scores are same even when
> the
> > index time boost was different. Is there any other way to implement my
> > expected result.
> > any help will be highly appreciated.
> >
> > Prabin
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: document with different index time boost returns same score

Posted by Michael McCandless <lu...@mikemccandless.com>.
The boost is stored in the index using a single byte, ie, heavily
quantized... that may explain what you are seeing?

If you make the boosts wildly different do you then see a score difference?

Mike

On Fri, Dec 18, 2009 at 12:40 PM, prabin meitei <pr...@gmail.com> wrote:
> Hi,
>   I have an index in which documents are inserted with different boost
> during indexing.
> eg.
> doc1 has boost 5.64
> doc2 has boost 5.25
> doc3 has boost 5.10
> doc4 has boost 4.8
> doc5 has boost 4.4
> doc6 has boost 4.0
> and so on... some documents even having boost only 1.0
>
> when i search the index for a term which occur only once in these documents
> i expected that the final scores of the search hit will differ according to
> the document boost set during indexing.
> But surprisingly i found that doc1, doc2 and doc3 have same score (even the
> raw score) and doc4 and doc5 have same score.
>
> I even tried changing index time boost to 564, 525, 510 .. and so on for
> doc1, doc2 doc3 and so on... but it still returned same result.
>
> Can anyone explain what is happening? why the scores are same even when the
> index time boost was different. Is there any other way to implement my
> expected result.
> any help will be highly appreciated.
>
> Prabin
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org