You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ron van der Vegt <ro...@openindex.io> on 2015/12/16 12:22:20 UTC

minimum should match, cant explain the amount of hits

Hi,

I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should 
match.
I'm not using the StopFilterFactory.

When turning on debug, this is the parsedquery_toString:

+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0 
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 | 
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15 
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 | 
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led | 
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 | 
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 | 
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 | 
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15

While I except that at least two terms should match, because of the 65%, 
i'm also getting hits of documents which seems to match on only one of 
the terms. Below the explain of the hit, which shouldn't be there:

2.6449876 = sum of:
   2.6449876 = sum of:
     2.6449876 = max plus 0.15 times others of:
       2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
         2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
           2.6449876 = idf(docFreq=3254, maxDocs=45833)
           1.0 = tfNorm, computed from:
             1.0 = termFreq=1.0
             1.0 = parameter k1
             0.0 = parameter b (norms omitted for field)

When I change the mm to 2<67% then I get the amount of results what I 
expect with 65%, but If I understand correctly then all the terms should 
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss 
something, or is there something else what could effect the minimum 
should match setting?

Thanks in advice!

Ron

Re: minimum should match, cant explain the amount of hits

Posted by Ron van der Vegt <ro...@openindex.io>.
Thanks! This makes sense, I will change my configuration to 2<-35%

On 16-12-15 13:11, Binoy Dalal wrote:
> The edismax documentation confirms that when a positive % value is
> provided, solr will round down. If you want solr to round up set your
> parameter value as '-35%'
>
> On Wed, 16 Dec 2015, 17:28 Binoy Dalal <bi...@gmail.com> wrote:
>
>> My guess is that solr is rounding down while calculating number of
>> mandatory terms.
>> In your case, there are 3 terms, 65% of which is 1.95 which rounded down
>> is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
>> results you're seeing.
>>
>> Maybe someone else can confirm this.
>>
>> On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently searching with the following query: q="sony+led+tv".
>>> The minimum should match setting is set on: mm=2<65%.
>>> So when there are more then two terms, at least 65% of the terms should
>>> match.
>>> I'm not using the StopFilterFactory.
>>>
>>> When turning on debug, this is the parsedquery_toString:
>>>
>>> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
>>> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
>>> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
>>> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
>>> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
>>> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
>>> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
>>> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
>>> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>>>
>>> While I except that at least two terms should match, because of the 65%,
>>> i'm also getting hits of documents which seems to match on only one of
>>> the terms. Below the explain of the hit, which shouldn't be there:
>>>
>>> 2.6449876 = sum of:
>>>     2.6449876 = sum of:
>>>       2.6449876 = max plus 0.15 times others of:
>>>         2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>>>           2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
>>> ), product of:
>>>             2.6449876 = idf(docFreq=3254, maxDocs=45833)
>>>             1.0 = tfNorm, computed from:
>>>               1.0 = termFreq=1.0
>>>               1.0 = parameter k1
>>>               0.0 = parameter b (norms omitted for field)
>>>
>>> When I change the mm to 2<67% then I get the amount of results what I
>>> expect with 65%, but If I understand correctly then all the terms should
>>> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
>>> something, or is there something else what could effect the minimum
>>> should match setting?
>>>
>>> Thanks in advice!
>>>
>>> Ron
>>>
>> --
>> Regards,
>> Binoy Dalal
>>


Re: minimum should match, cant explain the amount of hits

Posted by Binoy Dalal <bi...@gmail.com>.
The edismax documentation confirms that when a positive % value is
provided, solr will round down. If you want solr to round up set your
parameter value as '-35%'

On Wed, 16 Dec 2015, 17:28 Binoy Dalal <bi...@gmail.com> wrote:

> My guess is that solr is rounding down while calculating number of
> mandatory terms.
> In your case, there are 3 terms, 65% of which is 1.95 which rounded down
> is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
> results you're seeing.
>
> Maybe someone else can confirm this.
>
> On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
> wrote:
>
>> Hi,
>>
>> I'm currently searching with the following query: q="sony+led+tv".
>> The minimum should match setting is set on: mm=2<65%.
>> So when there are more then two terms, at least 65% of the terms should
>> match.
>> I'm not using the StopFilterFactory.
>>
>> When turning on debug, this is the parsedquery_toString:
>>
>> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
>> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
>> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
>> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
>> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
>> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
>> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
>> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
>> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>>
>> While I except that at least two terms should match, because of the 65%,
>> i'm also getting hits of documents which seems to match on only one of
>> the terms. Below the explain of the hit, which shouldn't be there:
>>
>> 2.6449876 = sum of:
>>    2.6449876 = sum of:
>>      2.6449876 = max plus 0.15 times others of:
>>        2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>>          2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
>> ), product of:
>>            2.6449876 = idf(docFreq=3254, maxDocs=45833)
>>            1.0 = tfNorm, computed from:
>>              1.0 = termFreq=1.0
>>              1.0 = parameter k1
>>              0.0 = parameter b (norms omitted for field)
>>
>> When I change the mm to 2<67% then I get the amount of results what I
>> expect with 65%, but If I understand correctly then all the terms should
>> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
>> something, or is there something else what could effect the minimum
>> should match setting?
>>
>> Thanks in advice!
>>
>> Ron
>>
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal

Re: minimum should match, cant explain the amount of hits

Posted by Binoy Dalal <bi...@gmail.com>.
My guess is that solr is rounding down while calculating number of
mandatory terms.
In your case, there are 3 terms, 65% of which is 1.95 which rounded down is
1, but 67% is 2.01 which rounded down is 2 which conforms with the results
you're seeing.

Maybe someone else can confirm this.

On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
wrote:

> Hi,
>
> I'm currently searching with the following query: q="sony+led+tv".
> The minimum should match setting is set on: mm=2<65%.
> So when there are more then two terms, at least 65% of the terms should
> match.
> I'm not using the StopFilterFactory.
>
> When turning on debug, this is the parsedquery_toString:
>
> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>
> While I except that at least two terms should match, because of the 65%,
> i'm also getting hits of documents which seems to match on only one of
> the terms. Below the explain of the hit, which shouldn't be there:
>
> 2.6449876 = sum of:
>    2.6449876 = sum of:
>      2.6449876 = max plus 0.15 times others of:
>        2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>          2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
> ), product of:
>            2.6449876 = idf(docFreq=3254, maxDocs=45833)
>            1.0 = tfNorm, computed from:
>              1.0 = termFreq=1.0
>              1.0 = parameter k1
>              0.0 = parameter b (norms omitted for field)
>
> When I change the mm to 2<67% then I get the amount of results what I
> expect with 65%, but If I understand correctly then all the terms should
> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
> something, or is there something else what could effect the minimum
> should match setting?
>
> Thanks in advice!
>
> Ron
>
-- 
Regards,
Binoy Dalal