You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by darul <da...@gmail.com> on 2011/11/08 13:22:55 UTC

Solr dismax scoring and weight

title^1.1 body^1.0 comments^0.5

Could someone explain me how to understand following query debug, and how
score is computed. 

Here are 4 documents with "Idée" word in title, body or comments.

Results are in this order by score, I do not undestand why fourth document
is not second in the results.

First :
Title : *Idée *intéressante
Body: Solr fonctionne chez vous ou pas ?

Second:
Title : *Idée* du lundi 01112011
Body : Voici le contenu de mon *idée*
Comments:
- commentaire avec le mot *idée*
- bonne *idée *pour un début de semaine

Third :
Title : Une *idée *pas comme les autres d'avant
Body: Ah oui cette *idée *est intéressante

Fourth :
Title : *Idée *intéressante encore
Body: Solr fonctionne chez vous ou pas ?

For example what mean " (MATCH) weight(title:idé^1.1 in 0)" 




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3490096.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr dismax scoring and weight

Posted by Erick Erickson <er...@gmail.com>.

No, I mean the number that's used to hold the length of the field is a byte,
but that it's not just a simple byte. It's encoded to handle very long
fields in that byte, but there's some loss of precision. For instance,
and I'm pulling numbers out of thin air here, fields of 1-25 terms may
collapse to the same length value. Same with 26-100 etc. But I really don't
know the details of what the buckets are.

Best
Erick

On Wed, Nov 23, 2011 at 2:47 PM, darul <da...@gmail.com> wrote:
> Thanks a lot Erick for this explanation. Do you mean words are stored in
> bytes, that's it ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3531917.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr dismax scoring and weight

Posted by darul <da...@gmail.com>.

Thanks a lot Erick for this explanation. Do you mean words are stored in
bytes, that's it ? 

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3531917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr dismax scoring and weight

Posted by Erick Erickson <er...@gmail.com>.

Length normalization is an attempt to factor in how long the field is. The idea
is that a token in a field with 10,000 tokens should count less than the word
in a field of 10 tokens. But since the length of the field is encoded
in a byte, the distinction between 4 and 20 characters is pretty much lost.

HTH
Erick

On Wed, Nov 9, 2011 at 3:59 AM, darul <da...@gmail.com> wrote:
> Thanks for the details, but what do you mean by normalization, can you
> describe shortly the concepts behind ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3492986.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr dismax scoring and weight

Posted by darul <da...@gmail.com>.

Thanks for the details, but what do you mean by normalization, can you
describe shortly the concepts behind ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3492986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr dismax scoring and weight

Posted by Erick Erickson <er...@gmail.com>.

What does the debugQuery explanation show? The calculations
aren't all that precise for short fields. The length normalization is
encoded and is essentially the same for short fields.

Best
Erick

On Tue, Nov 8, 2011 at 7:22 AM, darul <da...@gmail.com> wrote:
> title^1.1 body^1.0 comments^0.5
>
> Could someone explain me how to understand following query debug, and how
> score is computed.
>
> Here are 4 documents with "Idée" word in title, body or comments.
>
> Results are in this order by score, I do not undestand why fourth document
> is not second in the results.
>
> First :
> Title : *Idée *intéressante
> Body: Solr fonctionne chez vous ou pas ?
>
> Second:
> Title : *Idée* du lundi 01112011
> Body : Voici le contenu de mon *idée*
> Comments:
> - commentaire avec le mot *idée*
> - bonne *idée *pour un début de semaine
>
> Third :
> Title : Une *idée *pas comme les autres d'avant
> Body: Ah oui cette *idée *est intéressante
>
> Fourth :
> Title : *Idée *intéressante encore
> Body: Solr fonctionne chez vous ou pas ?
>
> For example what mean " (MATCH) weight(title:idé^1.1 in 0)"
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3490096.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>