You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ron van der Vegt <ro...@openindex.io> on 2015/12/16 12:22:20 UTC
minimum should match, cant explain the amount of hits
Hi,
I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should
match.
I'm not using the StopFilterFactory.
When turning on debug, this is the parsedquery_toString:
+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
While I except that at least two terms should match, because of the 65%,
i'm also getting hits of documents which seems to match on only one of
the terms. Below the explain of the hit, which shouldn't be there:
2.6449876 = sum of:
2.6449876 = sum of:
2.6449876 = max plus 0.15 times others of:
2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
2.6449876 = idf(docFreq=3254, maxDocs=45833)
1.0 = tfNorm, computed from:
1.0 = termFreq=1.0
1.0 = parameter k1
0.0 = parameter b (norms omitted for field)
When I change the mm to 2<67% then I get the amount of results what I
expect with 65%, but If I understand correctly then all the terms should
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
something, or is there something else what could effect the minimum
should match setting?
Thanks in advice!
Ron
Re: minimum should match, cant explain the amount of hits
Posted by Ron van der Vegt <ro...@openindex.io>.
Thanks! This makes sense, I will change my configuration to 2<-35%
On 16-12-15 13:11, Binoy Dalal wrote:
> The edismax documentation confirms that when a positive % value is
> provided, solr will round down. If you want solr to round up set your
> parameter value as '-35%'
>
> On Wed, 16 Dec 2015, 17:28 Binoy Dalal <bi...@gmail.com> wrote:
>
>> My guess is that solr is rounding down while calculating number of
>> mandatory terms.
>> In your case, there are 3 terms, 65% of which is 1.95 which rounded down
>> is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
>> results you're seeing.
>>
>> Maybe someone else can confirm this.
>>
>> On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently searching with the following query: q="sony+led+tv".
>>> The minimum should match setting is set on: mm=2<65%.
>>> So when there are more then two terms, at least 65% of the terms should
>>> match.
>>> I'm not using the StopFilterFactory.
>>>
>>> When turning on debug, this is the parsedquery_toString:
>>>
>>> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
>>> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
>>> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
>>> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
>>> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
>>> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
>>> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
>>> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
>>> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>>>
>>> While I except that at least two terms should match, because of the 65%,
>>> i'm also getting hits of documents which seems to match on only one of
>>> the terms. Below the explain of the hit, which shouldn't be there:
>>>
>>> 2.6449876 = sum of:
>>> 2.6449876 = sum of:
>>> 2.6449876 = max plus 0.15 times others of:
>>> 2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>>> 2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
>>> ), product of:
>>> 2.6449876 = idf(docFreq=3254, maxDocs=45833)
>>> 1.0 = tfNorm, computed from:
>>> 1.0 = termFreq=1.0
>>> 1.0 = parameter k1
>>> 0.0 = parameter b (norms omitted for field)
>>>
>>> When I change the mm to 2<67% then I get the amount of results what I
>>> expect with 65%, but If I understand correctly then all the terms should
>>> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
>>> something, or is there something else what could effect the minimum
>>> should match setting?
>>>
>>> Thanks in advice!
>>>
>>> Ron
>>>
>> --
>> Regards,
>> Binoy Dalal
>>
Re: minimum should match, cant explain the amount of hits
Posted by Binoy Dalal <bi...@gmail.com>.
The edismax documentation confirms that when a positive % value is
provided, solr will round down. If you want solr to round up set your
parameter value as '-35%'
On Wed, 16 Dec 2015, 17:28 Binoy Dalal <bi...@gmail.com> wrote:
> My guess is that solr is rounding down while calculating number of
> mandatory terms.
> In your case, there are 3 terms, 65% of which is 1.95 which rounded down
> is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
> results you're seeing.
>
> Maybe someone else can confirm this.
>
> On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
> wrote:
>
>> Hi,
>>
>> I'm currently searching with the following query: q="sony+led+tv".
>> The minimum should match setting is set on: mm=2<65%.
>> So when there are more then two terms, at least 65% of the terms should
>> match.
>> I'm not using the StopFilterFactory.
>>
>> When turning on debug, this is the parsedquery_toString:
>>
>> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
>> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
>> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
>> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
>> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
>> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
>> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
>> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
>> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>>
>> While I except that at least two terms should match, because of the 65%,
>> i'm also getting hits of documents which seems to match on only one of
>> the terms. Below the explain of the hit, which shouldn't be there:
>>
>> 2.6449876 = sum of:
>> 2.6449876 = sum of:
>> 2.6449876 = max plus 0.15 times others of:
>> 2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>> 2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
>> ), product of:
>> 2.6449876 = idf(docFreq=3254, maxDocs=45833)
>> 1.0 = tfNorm, computed from:
>> 1.0 = termFreq=1.0
>> 1.0 = parameter k1
>> 0.0 = parameter b (norms omitted for field)
>>
>> When I change the mm to 2<67% then I get the amount of results what I
>> expect with 65%, but If I understand correctly then all the terms should
>> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
>> something, or is there something else what could effect the minimum
>> should match setting?
>>
>> Thanks in advice!
>>
>> Ron
>>
> --
> Regards,
> Binoy Dalal
>
--
Regards,
Binoy Dalal
Re: minimum should match, cant explain the amount of hits
Posted by Binoy Dalal <bi...@gmail.com>.
My guess is that solr is rounding down while calculating number of
mandatory terms.
In your case, there are 3 terms, 65% of which is 1.95 which rounded down is
1, but 67% is 2.01 which rounded down is 2 which conforms with the results
you're seeing.
Maybe someone else can confirm this.
On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ro...@openindex.io>
wrote:
> Hi,
>
> I'm currently searching with the following query: q="sony+led+tv".
> The minimum should match setting is set on: mm=2<65%.
> So when there are more then two terms, at least 65% of the terms should
> match.
> I'm not using the StopFilterFactory.
>
> When turning on debug, this is the parsedquery_toString:
>
> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>
> While I except that at least two terms should match, because of the 65%,
> i'm also getting hits of documents which seems to match on only one of
> the terms. Below the explain of the hit, which shouldn't be there:
>
> 2.6449876 = sum of:
> 2.6449876 = sum of:
> 2.6449876 = max plus 0.15 times others of:
> 2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
> 2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
> ), product of:
> 2.6449876 = idf(docFreq=3254, maxDocs=45833)
> 1.0 = tfNorm, computed from:
> 1.0 = termFreq=1.0
> 1.0 = parameter k1
> 0.0 = parameter b (norms omitted for field)
>
> When I change the mm to 2<67% then I get the amount of results what I
> expect with 65%, but If I understand correctly then all the terms should
> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
> something, or is there something else what could effect the minimum
> should match setting?
>
> Thanks in advice!
>
> Ron
>
--
Regards,
Binoy Dalal