You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ariya bala <ar...@gmail.com> on 2015/05/20 12:13:37 UTC

Term Frequency Calculation - Clarification

Hi,
I have made custom class for scoring the similarity
(TermFrequencyBiasedSimilarity).
The score was deduced by considering just the TF part (acheived  by setting
IDF=1).

Question is:
-----------------
*Document content:* Foo Foo is in bar
*Search query:* Foo bar
*slop:* 3

With Slop 3, There are two matches to the query
 Foo is in bar
 Foo Foo is in bar

*Should the Term Frequency be 1 or 2? Also point to the explanation of the
logic implemented in Lucene/Solr.*

--
Cheers
*Ariya *

Re: Term Frequency Calculation - Clarification

Posted by ariya bala <ar...@gmail.com>.
Please ignore.


On Wed, May 20, 2015 at 2:45 PM, ariya bala <ar...@gmail.com> wrote:

> Thanks Jack.
> In my case there is only one document - Foo Foo is in bar
> As per your comment, I should expect TF to be 2.
> But I am getting one.
> Is there any check where if one match is a subset of other, is calculated
> once?
> My class extends DefaultSimilarity.
>
> Cheers
> Ariya Bala S
>
> On Wed, May 20, 2015 at 2:09 PM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> Yes.
>>
>> tf is both 1 and 2 - tf is per document, which is 1 for the first document
>> and 2 for the second document.
>>
>> See:
>>
>> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, May 20, 2015 at 6:13 AM, ariya bala <ar...@gmail.com> wrote:
>>
>> > Hi,
>> > I have made custom class for scoring the similarity
>> > (TermFrequencyBiasedSimilarity).
>> > The score was deduced by considering just the TF part (acheived  by
>> setting
>> > IDF=1).
>> >
>> > Question is:
>> > -----------------
>> > *Document content:* Foo Foo is in bar
>> > *Search query:* Foo bar
>> > *slop:* 3
>> >
>> > With Slop 3, There are two matches to the query
>> >  Foo is in bar
>> >  Foo Foo is in bar
>> >
>> > *Should the Term Frequency be 1 or 2? Also point to the explanation of
>> the
>> > logic implemented in Lucene/Solr.*
>> >
>> > --
>> > Cheers
>> > *Ariya *
>> >
>>
>
>
>
> --
> *Ariya *
>



-- 
*Ariya *

Re: Term Frequency Calculation - Clarification

Posted by ariya bala <ar...@gmail.com>.
Thanks Jack.
In my case there is only one document - Foo Foo is in bar
As per your comment, I should expect TF to be 2.
But I am getting one.
Is there any check where if one match is a subset of other, is calculated
once?
My class extends DefaultSimilarity.

Cheers
Ariya Bala S

On Wed, May 20, 2015 at 2:09 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> Yes.
>
> tf is both 1 and 2 - tf is per document, which is 1 for the first document
> and 2 for the second document.
>
> See:
>
> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
>
> -- Jack Krupansky
>
> On Wed, May 20, 2015 at 6:13 AM, ariya bala <ar...@gmail.com> wrote:
>
> > Hi,
> > I have made custom class for scoring the similarity
> > (TermFrequencyBiasedSimilarity).
> > The score was deduced by considering just the TF part (acheived  by
> setting
> > IDF=1).
> >
> > Question is:
> > -----------------
> > *Document content:* Foo Foo is in bar
> > *Search query:* Foo bar
> > *slop:* 3
> >
> > With Slop 3, There are two matches to the query
> >  Foo is in bar
> >  Foo Foo is in bar
> >
> > *Should the Term Frequency be 1 or 2? Also point to the explanation of
> the
> > logic implemented in Lucene/Solr.*
> >
> > --
> > Cheers
> > *Ariya *
> >
>



-- 
*Ariya *

Re: Term Frequency Calculation - Clarification

Posted by Jack Krupansky <ja...@gmail.com>.
Yes.

tf is both 1 and 2 - tf is per document, which is 1 for the first document
and 2 for the second document.

See:
http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html


-- Jack Krupansky

On Wed, May 20, 2015 at 6:13 AM, ariya bala <ar...@gmail.com> wrote:

> Hi,
> I have made custom class for scoring the similarity
> (TermFrequencyBiasedSimilarity).
> The score was deduced by considering just the TF part (acheived  by setting
> IDF=1).
>
> Question is:
> -----------------
> *Document content:* Foo Foo is in bar
> *Search query:* Foo bar
> *slop:* 3
>
> With Slop 3, There are two matches to the query
>  Foo is in bar
>  Foo Foo is in bar
>
> *Should the Term Frequency be 1 or 2? Also point to the explanation of the
> logic implemented in Lucene/Solr.*
>
> --
> Cheers
> *Ariya *
>