You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by LOPEZ-CORTES Mariano-ext <ma...@pole-emploi.fr> on 2018/05/23 12:21:06 UTC

Debugging/scoring question

Hi all

I've a 20 document collection. In a debugging plan, we have:

"1000000051":"
20.794415 = max of:
  20.794415 = weight(nomUsageE:jean in 1) [SchemaSimilarity], result of:
    20.794415 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      15.0 = boost
      1.3862944 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
        1.0 = docFreq
        5.0 = docCount
      1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        1.0 = avgFieldLength
        1.0 = fieldLength

  "1000000053":"
21.11246 = max of:
  21.11246 = weight(prenomE:jean in 3) [SchemaSimilarity], result of:
    21.11246 = score(doc=3,freq=1.0 = termFreq=1.0
), product of:
      8.0 = boost
      2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
        1.0 = docFreq
        20.0 = docCount
      1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        1.0 = avgFieldLength
        1.0 = fieldLength

docCount = 5.0 for the document 1000000051. Why? docCount is the total number of documents, isn't it?

Thanks in advance!



Re: Debugging/scoring question

Posted by Erick Erickson <er...@gmail.com>.
Well, first you have to be using that similarity ;)

Since Solr 6.0, BM25 has been the default similarity algorithm.

If you insist, you can modify the score with function queries, see the
docfreq method.

Best,
Erck

On Wed, May 23, 2018 at 12:17 PM, LOPEZ-CORTES Mariano-ext
<ma...@pole-emploi.fr> wrote:
> Yes. This make sense.
>
> I guess you talk about this doc:
>
> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> How I can decrease the effect of the IDF component in my query?
>
> Thanks!!
>
> -----Message d'origine-----
> De : Alessandro Benedetti [mailto:a.benedetti@sease.io]
> Envoyé : mercredi 23 mai 2018 18:05
> À : solr-user@lucene.apache.org
> Objet : Re: Debugging/scoring question
>
> Hi Mariano,
> From the documentation :
>
> docCount = total number of documents containing this field, in the range [1 .. {@link #maxDoc()}]
>
> In your debug the fields involved in the score computation are indeed different ( nomUsageE, prenomE) .
>
> Does this make sense ?
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Debugging/scoring question

Posted by LOPEZ-CORTES Mariano-ext <ma...@pole-emploi.fr>.
Yes. This make sense.

I guess you talk about this doc:

https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

How I can decrease the effect of the IDF component in my query?

Thanks!!

-----Message d'origine-----
De : Alessandro Benedetti [mailto:a.benedetti@sease.io] 
Envoyé : mercredi 23 mai 2018 18:05
À : solr-user@lucene.apache.org
Objet : Re: Debugging/scoring question

Hi Mariano,
From the documentation :

docCount = total number of documents containing this field, in the range [1 .. {@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed different ( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Debugging/scoring question

Posted by Alessandro Benedetti <a....@sease.io>.
Hi Mariano,
From the documentation :

docCount = total number of documents containing this field, in the range [1
.. {@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed
different ( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html