You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by elisabeth benoit <el...@gmail.com> on 2016/04/28 17:31:47 UTC

deactivate coord scoring factor in pf2 pf3

Hello all,

I am using Solr 4.10.1. I use edismax, with pf2 to boost documents starting
with. I use a start with token (bzzzz) automatically added at index time,
and added in request at query time.

I have a problem at this point.

request is *q=bzzzz saint denis rer*

the start with field is name_sw

first document *name_sw: Saint-Denis-Université*
second document *name_sw: RER Saint-Denis*

So one will have the pf2 starts with boost and not the other. The problem
is that it has an effect on the scoring of pf2 for all other words.

In other words, my problem is the proximity between "saint" and "denis" is
not scored the same value for those two documents.

>From what I get this is because of the coord scoring factor used for pf2.

In explain output, for first document

0.52612317 Matches Punished by 0.6666667 (not all query terms matched)
   0.78918475 Sum of the following:
     0.39459237 names_sw:"bzzzz saint"^0.21

     0.39459237 Dismax (take winner of below)
       0.39459237 names_sw:"saint denis"^0.21

       0.37580228 catchall:"saint den"^0.2


*So here, matches punished by 0.66*, which corresponds to coord(2/3)

and final score pf2 for proximity between saint and denis

0.263061593153079 names_sw:"saint denis"^0.21


In explain output, for second document


 0.13153079 Matches Punished by 0.33333334 (not all query terms matched)
   0.39459237 Dismax (take winner of below)
     0.39459237 names_sw:"saint denis"^0.21

     0.37580228 catchall:"saint den"^0.2


*So here matches punished by 0.33*, which corresponds to coord(1/3)

and final score pf2 for proximity between saint and denis

0.1315307926306158 names_sw:"saint denis"^0.21


I would like to deactivate coord for pf2 pf3. Does anyone know how I
could do this?


Best regards,

Elisabeth

Re: deactivate coord scoring factor in pf2 pf3

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
I was wrong Elisabeth. I thought you could disable coord at query time in
Solr, turns out you can't (I was thinking of Lucene's BooleanQuery
disableCoord param).

https://issues.apache.org/jira/browse/SOLR-3931

I definitely know you can disable coord with a custom Similarity and just
return 1.0 for coordinating factor.
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.4.0/org/apache/lucene/search/similarities/TFIDFSimilarity.java#TFIDFSimilarity.coord%28int%2Cint%29

I was also trying to read the tea leaves of edismax's source to try to
figure out when coord is disabled and when its not (and what the underlying
justification is). Sadly I don't see an explicit rhyme or reason to its
coord behavior and what you're seeing. Do you have the full query you're
sending?
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java

-Doug

On Thu, Apr 28, 2016 at 2:05 PM Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Glad to see you're using http://splainer.io! I recognize those explains!
> (let me know if you have any ideas/thoughts/questions/criticisms I created
> the thing).
>
> Some thoughts
> - You might consider using ps2 or ps3 to add a slop to the two word and
> three word phrase searches. Slop adds a less strict positional tolerance.
> This would help get RER paired with Saint in your other document, and
> effectively eliminate the coord. , though at a lower score (1 / position
> difference IIRC)
> - Have you tried sending "disableCoord" to Solr? I usually leave coord on,
> as I consider it useful to bias towards more matches. But that option
> exists.
> - Using pf2 and pf3 together means that 3 word phrase matches will get
> counted twice. Once as a three word phrase match. Again as multiple 2 word
> phrase matches. I usually just stick with pf2.
>
> Best!
>
> -Doug
>
> On Thu, Apr 28, 2016 at 11:32 AM elisabeth benoit <
> elisaelisaelisa@gmail.com> wrote:
>
>> Hello all,
>>
>> I am using Solr 4.10.1. I use edismax, with pf2 to boost documents
>> starting
>> with. I use a start with token (bzzzz) automatically added at index time,
>> and added in request at query time.
>>
>> I have a problem at this point.
>>
>> request is *q=bzzzz saint denis rer*
>>
>> the start with field is name_sw
>>
>> first document *name_sw: Saint-Denis-Université*
>> second document *name_sw: RER Saint-Denis*
>>
>> So one will have the pf2 starts with boost and not the other. The problem
>> is that it has an effect on the scoring of pf2 for all other words.
>>
>> In other words, my problem is the proximity between "saint" and "denis" is
>> not scored the same value for those two documents.
>>
>> From what I get this is because of the coord scoring factor used for pf2.
>>
>> In explain output, for first document
>>
>> 0.52612317 Matches Punished by 0.6666667 (not all query terms matched)
>>    0.78918475 Sum of the following:
>>      0.39459237 names_sw:"bzzzz saint"^0.21
>>
>>      0.39459237 Dismax (take winner of below)
>>        0.39459237 names_sw:"saint denis"^0.21
>>
>>        0.37580228 catchall:"saint den"^0.2
>>
>>
>> *So here, matches punished by 0.66*, which corresponds to coord(2/3)
>>
>> and final score pf2 for proximity between saint and denis
>>
>> 0.263061593153079 names_sw:"saint denis"^0.21
>>
>>
>> In explain output, for second document
>>
>>
>>  0.13153079 Matches Punished by 0.33333334 (not all query terms matched)
>>    0.39459237 Dismax (take winner of below)
>>      0.39459237 names_sw:"saint denis"^0.21
>>
>>      0.37580228 catchall:"saint den"^0.2
>>
>>
>> *So here matches punished by 0.33*, which corresponds to coord(1/3)
>>
>> and final score pf2 for proximity between saint and denis
>>
>> 0.1315307926306158 names_sw:"saint denis"^0.21
>>
>>
>> I would like to deactivate coord for pf2 pf3. Does anyone know how I
>> could do this?
>>
>>
>> Best regards,
>>
>> Elisabeth
>>
>

Re: deactivate coord scoring factor in pf2 pf3

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
Glad to see you're using http://splainer.io! I recognize those explains!
(let me know if you have any ideas/thoughts/questions/criticisms I created
the thing).

Some thoughts
- You might consider using ps2 or ps3 to add a slop to the two word and
three word phrase searches. Slop adds a less strict positional tolerance.
This would help get RER paired with Saint in your other document, and
effectively eliminate the coord. , though at a lower score (1 / position
difference IIRC)
- Have you tried sending "disableCoord" to Solr? I usually leave coord on,
as I consider it useful to bias towards more matches. But that option
exists.
- Using pf2 and pf3 together means that 3 word phrase matches will get
counted twice. Once as a three word phrase match. Again as multiple 2 word
phrase matches. I usually just stick with pf2.

Best!
-Doug

On Thu, Apr 28, 2016 at 11:32 AM elisabeth benoit <el...@gmail.com>
wrote:

> Hello all,
>
> I am using Solr 4.10.1. I use edismax, with pf2 to boost documents starting
> with. I use a start with token (bzzzz) automatically added at index time,
> and added in request at query time.
>
> I have a problem at this point.
>
> request is *q=bzzzz saint denis rer*
>
> the start with field is name_sw
>
> first document *name_sw: Saint-Denis-Université*
> second document *name_sw: RER Saint-Denis*
>
> So one will have the pf2 starts with boost and not the other. The problem
> is that it has an effect on the scoring of pf2 for all other words.
>
> In other words, my problem is the proximity between "saint" and "denis" is
> not scored the same value for those two documents.
>
> From what I get this is because of the coord scoring factor used for pf2.
>
> In explain output, for first document
>
> 0.52612317 Matches Punished by 0.6666667 (not all query terms matched)
>    0.78918475 Sum of the following:
>      0.39459237 names_sw:"bzzzz saint"^0.21
>
>      0.39459237 Dismax (take winner of below)
>        0.39459237 names_sw:"saint denis"^0.21
>
>        0.37580228 catchall:"saint den"^0.2
>
>
> *So here, matches punished by 0.66*, which corresponds to coord(2/3)
>
> and final score pf2 for proximity between saint and denis
>
> 0.263061593153079 names_sw:"saint denis"^0.21
>
>
> In explain output, for second document
>
>
>  0.13153079 Matches Punished by 0.33333334 (not all query terms matched)
>    0.39459237 Dismax (take winner of below)
>      0.39459237 names_sw:"saint denis"^0.21
>
>      0.37580228 catchall:"saint den"^0.2
>
>
> *So here matches punished by 0.33*, which corresponds to coord(1/3)
>
> and final score pf2 for proximity between saint and denis
>
> 0.1315307926306158 names_sw:"saint denis"^0.21
>
>
> I would like to deactivate coord for pf2 pf3. Does anyone know how I
> could do this?
>
>
> Best regards,
>
> Elisabeth
>