You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Vanlerberghe, Luc" <Lu...@bvdinfo.com> on 2014/11/26 16:33:59 UTC

Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output

TFIDFSimilarity.explainScore currently outputs an annoying (but harmless of course) extra \n.

It occurs because the freq argument is included as is in the description of the top Explain node,
whereas freq.getValue() is sufficient. The full freq Explain node is included as a detail further on anyway...

I attached a patch generated with git, but it's just:
-    result.setDescription("score(doc="+doc+",freq="+freq+"), product of:");
+    result.setDescription("score(doc="+doc+",freq="+freq.getValue()+"), product of:");

Output like this:

  <lst name="explain">
    <str name="0-764629">
5.5484066 = (MATCH) max of:
  5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
    5.5484066 = score(doc=4158,freq=1.0 = termFreq=1.0
), product of:
      0.60149205 = queryWeight, product of:
        9.224405 = idf(docFreq=450, maxDocs=1682636)
        0.065206595 = queryNorm
      9.224405 = fieldWeight in 4158, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        9.224405 = idf(docFreq=450, maxDocs=1682636)
        1.0 = fieldNorm(doc=4158)
</str>
  </lst>

becomes:

  <lst name="explain">
    <str name="0-764629">
5.5484066 = (MATCH) max of:
  5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
    5.5484066 = score(doc=4158,freq=1.0), product of:
      0.60149205 = queryWeight, product of:
        9.224405 = idf(docFreq=450, maxDocs=1682636)
        0.065206595 = queryNorm
      9.224405 = fieldWeight in 4158, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        9.224405 = idf(docFreq=450, maxDocs=1682636)
        1.0 = fieldNorm(doc=4158)
</str>
  </lst>

Re: Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output

Posted by Michael McCandless <lu...@mikemccandless.com>.

Aha, excellent!  I will commit.  Thank you.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 26, 2014 at 11:04 AM, Vanlerberghe, Luc
<Lu...@bvdinfo.com> wrote:
> The "freq" explanation itself is still included as detail a bit lower in the code (line 798 in my version)
> so no information gets lost!
>
> See:
>>           1.0 = termFreq=1.0
>
> Luc
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: woensdag 26 november 2014 16:59
> To: Lucene/Solr dev; Vanlerberghe, Luc
> Subject: Re: Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output
>
> Thank you for the patch!  I agree that is annoying.
>
> It makes me a little nervous, losing possibly important explanation
> about how that freq itself was computed?
>
> E.g. a PhraseQuery will have "phraseFreq=X" as the explanation for
> that freq, telling you this wasn't just a simple term freq ... I
> wonder whether other queries want to explain an interesting freq?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 26, 2014 at 10:33 AM, Vanlerberghe, Luc
> <Lu...@bvdinfo.com> wrote:
>> TFIDFSimilarity.explainScore currently outputs an annoying (but harmless of course) extra \n.
>>
>> It occurs because the freq argument is included as is in the description of the top Explain node,
>> whereas freq.getValue() is sufficient. The full freq Explain node is included as a detail further on anyway...
>>
>> I attached a patch generated with git, but it's just:
>> -    result.setDescription("score(doc="+doc+",freq="+freq+"), product of:");
>> +    result.setDescription("score(doc="+doc+",freq="+freq.getValue()+"), product of:");
>>
>> Output like this:
>>
>>   <lst name="explain">
>>     <str name="0-764629">
>> 5.5484066 = (MATCH) max of:
>>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>>     5.5484066 = score(doc=4158,freq=1.0 = termFreq=1.0
>> ), product of:
>>       0.60149205 = queryWeight, product of:
>>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>>         0.065206595 = queryNorm
>>       9.224405 = fieldWeight in 4158, product of:
>>         1.0 = tf(freq=1.0), with freq of:
>>           1.0 = termFreq=1.0
>>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>>         1.0 = fieldNorm(doc=4158)
>> </str>
>>   </lst>
>>
>> becomes:
>>
>>   <lst name="explain">
>>     <str name="0-764629">
>> 5.5484066 = (MATCH) max of:
>>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>>     5.5484066 = score(doc=4158,freq=1.0), product of:
>>       0.60149205 = queryWeight, product of:
>>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>>         0.065206595 = queryNorm
>>       9.224405 = fieldWeight in 4158, product of:
>>         1.0 = tf(freq=1.0), with freq of:
>>           1.0 = termFreq=1.0
>>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>>         1.0 = fieldNorm(doc=4158)
>> </str>
>>   </lst>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output

Posted by "Vanlerberghe, Luc" <Lu...@bvdinfo.com>.

The "freq" explanation itself is still included as detail a bit lower in the code (line 798 in my version)
so no information gets lost!

See:
>           1.0 = termFreq=1.0

Luc

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: woensdag 26 november 2014 16:59
To: Lucene/Solr dev; Vanlerberghe, Luc
Subject: Re: Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output

Thank you for the patch!  I agree that is annoying.

It makes me a little nervous, losing possibly important explanation
about how that freq itself was computed?

E.g. a PhraseQuery will have "phraseFreq=X" as the explanation for
that freq, telling you this wasn't just a simple term freq ... I
wonder whether other queries want to explain an interesting freq?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 26, 2014 at 10:33 AM, Vanlerberghe, Luc
<Lu...@bvdinfo.com> wrote:
> TFIDFSimilarity.explainScore currently outputs an annoying (but harmless of course) extra \n.
>
> It occurs because the freq argument is included as is in the description of the top Explain node,
> whereas freq.getValue() is sufficient. The full freq Explain node is included as a detail further on anyway...
>
> I attached a patch generated with git, but it's just:
> -    result.setDescription("score(doc="+doc+",freq="+freq+"), product of:");
> +    result.setDescription("score(doc="+doc+",freq="+freq.getValue()+"), product of:");
>
> Output like this:
>
>   <lst name="explain">
>     <str name="0-764629">
> 5.5484066 = (MATCH) max of:
>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>     5.5484066 = score(doc=4158,freq=1.0 = termFreq=1.0
> ), product of:
>       0.60149205 = queryWeight, product of:
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         0.065206595 = queryNorm
>       9.224405 = fieldWeight in 4158, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         1.0 = fieldNorm(doc=4158)
> </str>
>   </lst>
>
> becomes:
>
>   <lst name="explain">
>     <str name="0-764629">
> 5.5484066 = (MATCH) max of:
>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>     5.5484066 = score(doc=4158,freq=1.0), product of:
>       0.60149205 = queryWeight, product of:
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         0.065206595 = queryNorm
>       9.224405 = fieldWeight in 4158, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         1.0 = fieldNorm(doc=4158)
> </str>
>   </lst>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org

Re: Cosmetic: Getting rid of an extra \n in TFIDFSimilarity.explainScore output

Posted by Michael McCandless <lu...@mikemccandless.com>.

Thank you for the patch!  I agree that is annoying.

It makes me a little nervous, losing possibly important explanation
about how that freq itself was computed?

E.g. a PhraseQuery will have "phraseFreq=X" as the explanation for
that freq, telling you this wasn't just a simple term freq ... I
wonder whether other queries want to explain an interesting freq?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 26, 2014 at 10:33 AM, Vanlerberghe, Luc
<Lu...@bvdinfo.com> wrote:
> TFIDFSimilarity.explainScore currently outputs an annoying (but harmless of course) extra \n.
>
> It occurs because the freq argument is included as is in the description of the top Explain node,
> whereas freq.getValue() is sufficient. The full freq Explain node is included as a detail further on anyway...
>
> I attached a patch generated with git, but it's just:
> -    result.setDescription("score(doc="+doc+",freq="+freq+"), product of:");
> +    result.setDescription("score(doc="+doc+",freq="+freq.getValue()+"), product of:");
>
> Output like this:
>
>   <lst name="explain">
>     <str name="0-764629">
> 5.5484066 = (MATCH) max of:
>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>     5.5484066 = score(doc=4158,freq=1.0 = termFreq=1.0
> ), product of:
>       0.60149205 = queryWeight, product of:
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         0.065206595 = queryNorm
>       9.224405 = fieldWeight in 4158, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         1.0 = fieldNorm(doc=4158)
> </str>
>   </lst>
>
> becomes:
>
>   <lst name="explain">
>     <str name="0-764629">
> 5.5484066 = (MATCH) max of:
>   5.5484066 = (MATCH) weight(titreSearch:camus in 4158) [DefaultSimilarity], result of:
>     5.5484066 = score(doc=4158,freq=1.0), product of:
>       0.60149205 = queryWeight, product of:
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         0.065206595 = queryNorm
>       9.224405 = fieldWeight in 4158, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         9.224405 = idf(docFreq=450, maxDocs=1682636)
>         1.0 = fieldNorm(doc=4158)
> </str>
>   </lst>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org