You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2009/02/02 18:07:59 UTC

[jira] Created: (LUCENE-1534) idf(t) is not actually squared during scoring?

idf(t) is not actually squared during scoring?
----------------------------------------------

                 Key: LUCENE-1534
                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
             Project: Lucene - Java
          Issue Type: Bug
          Components: Query/Scoring
    Affects Versions: 2.4, 2.3.2, 2.3.1, 2.3, 2.2, 2.1
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.9


The javadocs for Similarity:

  http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

show idf(t) as being squared when computing net query score.  But I
don't think it is actually squared, in looking at the sources?  Maybe
it used to be, eg this interesting discussion:

  http://markmail.org/message/k5pl7scmiac5wosb

Or am I missing something?  We just need to fix the javadocs to take
away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669794#action_12669794 ] 

Michael McCandless commented on LUCENE-1534:
--------------------------------------------

bq. But if we feel that over-emphasizes terms with large idfs, then we should not remove an idf factor from one vector, but rather rework our weight heuristic, perhaps replacing idf with sqrt(idf), no?

I agree, that should be the approach if we decide idf^2 is too much, but I don't have an opinion (yet!) on whether it's too much (but that thread referenced above is nonetheless interesting).


> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1534.
----------------------------------------

    Resolution: Invalid

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: [jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by Uwe Schindler <uw...@thetaphi.de>.
On java-user a question about "term frequency normalization" and changing
Similarity appeared, too.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Tuesday, February 03, 2009 10:59 PM
> To: java-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENE-1534) idf(t) is not actually
> squared during scoring?
> 
> We should probably have that discussion (changing default Sim) on java-
> dev, not on a particular JIRA issue.  I, for one, am interested in
> having the discussion.
> 
> 
> On Feb 3, 2009, at 3:35 PM, Doug Cutting (JIRA) wrote:
> 
> >
> >    [ https://issues.apache.org/jira/browse/LUCENE-
> 1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12670103
> > #action_12670103 ]
> >
> > Doug Cutting commented on LUCENE-1534:
> > --------------------------------------
> >
> > Now, on the cusp of 3.0, might be a good time to think about
> > changing the default ranking algorithm.  This is potentially a
> > disruptive change, but we can easily provide a back-compatible
> > Similarity implementation.  Are there other changes to the default
> > Similarity that may be of general utility?  Or do folks thinks its
> > better to leave this alone?
> >
> >> idf(t) is not actually squared during scoring?
> >> ----------------------------------------------
> >>
> >>                Key: LUCENE-1534
> >>                URL: https://issues.apache.org/jira/browse/LUCENE-1534
> >>            Project: Lucene - Java
> >>         Issue Type: Bug
> >>         Components: Query/Scoring
> >>   Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
> >>           Reporter: Michael McCandless
> >>           Assignee: Michael McCandless
> >>           Priority: Minor
> >>            Fix For: 2.9
> >>
> >>
> >> The javadocs for Similarity:
> >>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarit
> y.html
> >> show idf(t) as being squared when computing net query score.  But I
> >> don't think it is actually squared, in looking at the sources?  Maybe
> >> it used to be, eg this interesting discussion:
> >>  http://markmail.org/message/k5pl7scmiac5wosb
> >> Or am I missing something?  We just need to fix the javadocs to take
> >> away the "squared"...
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by Grant Ingersoll <gs...@apache.org>.
We should probably have that discussion (changing default Sim) on java- 
dev, not on a particular JIRA issue.  I, for one, am interested in  
having the discussion.


On Feb 3, 2009, at 3:35 PM, Doug Cutting (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670103 
> #action_12670103 ]
>
> Doug Cutting commented on LUCENE-1534:
> --------------------------------------
>
> Now, on the cusp of 3.0, might be a good time to think about  
> changing the default ranking algorithm.  This is potentially a  
> disruptive change, but we can easily provide a back-compatible  
> Similarity implementation.  Are there other changes to the default  
> Similarity that may be of general utility?  Or do folks thinks its  
> better to leave this alone?
>
>> idf(t) is not actually squared during scoring?
>> ----------------------------------------------
>>
>>                Key: LUCENE-1534
>>                URL: https://issues.apache.org/jira/browse/LUCENE-1534
>>            Project: Lucene - Java
>>         Issue Type: Bug
>>         Components: Query/Scoring
>>   Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>>           Reporter: Michael McCandless
>>           Assignee: Michael McCandless
>>           Priority: Minor
>>            Fix For: 2.9
>>
>>
>> The javadocs for Similarity:
>>  http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
>> show idf(t) as being squared when computing net query score.  But I
>> don't think it is actually squared, in looking at the sources?  Maybe
>> it used to be, eg this interesting discussion:
>>  http://markmail.org/message/k5pl7scmiac5wosb
>> Or am I missing something?  We just need to fix the javadocs to take
>> away the "squared"...
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670103#action_12670103 ] 

Doug Cutting commented on LUCENE-1534:
--------------------------------------

Now, on the cusp of 3.0, might be a good time to think about changing the default ranking algorithm.  This is potentially a disruptive change, but we can easily provide a back-compatible Similarity implementation.  Are there other changes to the default Similarity that may be of general utility?  Or do folks thinks its better to leave this alone?

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669663#action_12669663 ] 

Mark Miller commented on LUCENE-1534:
-------------------------------------

hmmm...we do multiply it in twice, but a bit happens in between - we multiply by idf(t) in sumOfSquaredWeights()  and then again in normalize(float queryNorm).

Technically that is boost * idf(t) * norm * idf(t), right? For idf(t)^2 * boost * norm? And then that times tf in the scorer... 

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669750#action_12669750 ] 

Doug Cutting commented on LUCENE-1534:
--------------------------------------

I've always found "idf squared" an unhelpful description.  We're computing a dot-product of two vectors, the angle between them.  Terms are dimensions.  The magnitude in each dimension is the weight of the term in a query or document.  Our heuristic for computing weights is (sqrt(tf)*idf)/norm.  Put all that together, and you do indeed get an "idf squared" factor in each addend of the score.  But if we feel that over-emphasizes terms with large idfs, then we should not remove an idf factor from one vector, but rather rework our weight heuristic, perhaps replacing idf with sqrt(idf), no?

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669843#action_12669843 ] 

Mike Klaas commented on LUCENE-1534:
------------------------------------

[quote]But if we feel that over-emphasizes terms with large idfs, then we should not remove an idf factor from one vector, but rather rework our weight heuristic, perhaps replacing idf with sqrt(idf), no?[/quote]

FWIW, having implemented web search on a large (500m) corpus, we found the stock idf factor in lucene is too high, and ended up sqrt()'ing it in Similarity.

That said, much of the score in this system came from anchor text, link analysis scores, and term proximity, so it's hard to measure the impact the idf change independently.

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669673#action_12669673 ] 

Yonik Seeley commented on LUCENE-1534:
--------------------------------------

Right.... and explain explains it by having 1 idf factor in the queryWeight and 1 in the fieldWeight:

{code}
0.6433005 = (MATCH) weight(text:solr in 14), product of:
  0.99999994 = queryWeight(text:solr), product of:
    3.6390574 = idf(docFreq=1, numDocs=26)
    0.27479643 = queryNorm
  0.64330053 = (MATCH) fieldWeight(text:solr in 14), product of:
    1.4142135 = tf(termFreq(text:solr)=2)
    3.6390574 = idf(docFreq=1, numDocs=26)
    0.125 = fieldNorm(field=text, doc=14)
{code}


> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669754#action_12669754 ] 

Doug Cutting commented on LUCENE-1534:
--------------------------------------

sumOfSquaredWeights properly normalizes query vectors to the unit sphere.  We can't easily do that with document vectors, since idfs change as the collection changes.  So we instead use a heuristic to normalize documents, sqrt(numTokens), which is usually a good approximation.  Regardless of how it's normalized, the global term weight factors twice in each addend, once from each vector.

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669756#action_12669756 ] 

Yonik Seeley commented on LUCENE-1534:
--------------------------------------

{quote}
EG for a single TermQuery, the queryWeight will always be 1.0 (except
for roundoff errors), cancelling out that idf factor, leaving only one
idf factor?
{quote}

Yes, for a score returned to the user only one idf factor remains because of the normalization.
*But* the more important part of the scoring is how terms are scored relative to each other in the same query - and that is still idf**2

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669791#action_12669791 ] 

Michael McCandless commented on LUCENE-1534:
--------------------------------------------

bq. But the more important part of the scoring is how terms are scored relative to each other in the same query - and that is still idf**2

Ahh OK, now I get it -- idf is indeed factored in twice.  A single TermQuery is a somewhat degenerate case; queries with more than one term will show the effect.  Thanks for clarifying ;)

> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1534) idf(t) is not actually squared during scoring?

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669743#action_12669743 ] 

Michael McCandless commented on LUCENE-1534:
--------------------------------------------

But sumOfSquaredWeights is only used as a fixed normalization across
all sub-queries in the Query?

EG for a single TermQuery, the queryWeight will always be 1.0 (except
for roundoff errors), cancelling out that idf factor, leaving only one
idf factor?


> idf(t) is not actually squared during scoring?
> ----------------------------------------------
>
>                 Key: LUCENE-1534
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1534
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The javadocs for Similarity:
>   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> show idf(t) as being squared when computing net query score.  But I
> don't think it is actually squared, in looking at the sources?  Maybe
> it used to be, eg this interesting discussion:
>   http://markmail.org/message/k5pl7scmiac5wosb
> Or am I missing something?  We just need to fix the javadocs to take
> away the "squared"...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org