You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Samarendra Pratap <sa...@gmail.com> on 2011/11/07 09:25:39 UTC

to prevent number-of-matching-terms in contributing score

Hi everyone!
 We are working on Solr - 3.4.

 In Short: If my query term matches more than one words I want it to be
considered as one match (in a particular field).

 Details:
  Our index has a multi-valued field "category" which contains possible
category names of the company. It is entered by the company employees.
There are two companies in the index

  1. First company falls in category -> "wooden chairs"
  2. Second company falls in following categories -> "chairs", "plastic
chairs", "wooden chairs"

 Now when I search for "chair" in "category" field (along with other fields
as "qf" parameter), the second company gets the higher score due to
multiple match against the word "chair". As per the business logic in
"category" field it should be a match or no-match for score calculation
because this field in not filled in by the end user and length/match of
text does not add to relevance.

 We are already using "omitNorms=true". So we have prevented length of the
field to contribute in score, but have been unable to prevent number of
matching terms' contribution.
 We can not use filters (fq) because there are other fields I am matching
in. We want "category" field matching such that "n" number of matches in
"category" are equivalent to one match in "title" field.

 Can someone give me some pointers on how to achieve this? or is there any
other better way of doing this?


-- 
Regards,
Samar

Re: to prevent number-of-matching-terms in contributing score

Posted by Samarendra Pratap <sa...@gmail.com>.

No solutions to the problem?
OK. I'll look for the changes in source code and if I succeed I'll share it
here for feedback.

Thanks


On Tue, Nov 8, 2011 at 5:06 PM, Samarendra Pratap <sa...@gmail.com>wrote:

> Hi Chris,
>  Thanks for the insight.
>
>  1. "omitTermFreqAndPositions" is very straightforward but if I avoid
> positions I'll refuse to serve phrase queries. I had searched for this in
> past as well but I finally reached to the conclusion that there is no thing
> like "omitTermFreq" (only). Perhaps because frequency is the count of
> positions of a term and we can not discard it if latter is present. :( .
> Please point me out If I am wrong. And if I really am, that would be
> exactly what I need.
>
>  2. Function query seemed nice (though strange because I never used it
> before) and I gave it a few hours but that too did not seem to solve my
> requirement. The "artificial" score we are generating is getting multiplied
> into rest of the score which includes score due to "cat" field as well. (I
> can not remove "cat" from "qf" as I have to search there). It is only that
> I don't want this field's score on the basis of matching "tf".
>
>
>  To explain second point here is what I did.
>  I indexed 4 documents
> doc 1
>
> tile:chair,
> cat:chair and chair
>
> doc 2
>
> tile:table,
> cat:chair and chair
>
> doc 3
>
> tile:chair,
> cat:chair and table
>
> doc 4
>
> tile:table,
> cat:chair and table
>
>
> searching for a simple query
> http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> q=*chair*&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> qf=title&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> qf=cat&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> fl=title,cat,id,score&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> pf=ttile<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
>
> gives 4 results (1,3,2,4)
>
> I want document 1 and 3 with equal score and 2 and 4 with similar score.
> because the only difference within the pairs is only "cat" field's value
>
> After spending some hours on function queries I finally reached on
> following query
> http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> q={!boost%20b=$cat_boost%20v=$main_query}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> main_query={!dismax%20qf=%22title%20cat%22%20v=$qry}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> cat_boost={!func}map(query({!field%20f=cat%20v=$qry},-1),0,1000,1,0)&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> qry=*chair*&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> qf=title&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> qf=cat&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> fl=title,cat,displayid,score&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
> pf=ttile<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
>
>
> But debugging the query showed that the boost value ($cat_boost) is being
> multiplied into a value which is generated with the help of "cat" field
> thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
>
> 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
> (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:
>   1.2942866 = (MATCH) sum of:
>     1.2942866 = (MATCH) max plus 0.01 times others of:
>       1.2876587 = (MATCH) weight(title:chair in 0), product of:
>         0.9999818 = queryWeight(title:chair), product of:
>           1.287682 = idf(docFreq=2, maxDocs=4)
>           0.7765751 = queryNorm
>         1.287682 = (MATCH) fieldWeight(title:chair in 0), product of:
>           1.0 = tf(termFreq(title:chair)=1)
>           1.287682 = idf(docFreq=2, maxDocs=4)
>           1.0 = fieldNorm(field=title, doc=0)
>       0.66279614 = (MATCH) weight(cat:chair in 0), product of:
>         0.60328734 = queryWeight(cat:chair), product of:
>           0.7768564 = idf(docFreq=4, maxDocs=4)
>           0.7765751 = queryNorm
>         1.0986409 = (MATCH) fieldWeight(cat:chair in 0), product of:
>           1.4142135 = tf(termFreq(cat:chair)=2)
>           0.7768564 = idf(docFreq=4, maxDocs=4)
>           1.0 = fieldNorm(field=cat, doc=0)
>  * 1.0* =
> map(query(cat:chair,def=-1.0)=1.0986409,min=0.0,max=1000.0,target=1.0)
>
>
>
>
> Did I get you wrong?
> I'll appreciate if you could point out any mistake (or my
> misinterpretation) in the mail above.
>
>
> I was thinking there should be some hook or plugin (or anything) which
> could just change the score calculation formula *for a particular field*.
> There is a function in DefaultSimilarity class - *public float tf(float
> freq)* but that does not mention the field name. Is there a possibility
> to look into this direction?
>
>
> Thank you very much.
>
>
>
>
> On Tue, Nov 8, 2011 at 6:23 AM, Chris Hostetter <ho...@fucit.org>wrote:
>
>>
>> : You can write your custom similarity implementation, and override the
>> : /lengthNorm()/ method to return a constant value.
>>
>> The postered already said (twice!) that they have already set
>> omitNorms=true, so lengthNorm won't even be used
>>
>> omiting norms (or mucking with norms by modifying the lengthNorm function)
>> only affects the norms portion of the scoring -- the problem being
>> described here is when a document matches the input term more then once:
>> that is an issue of the "term freuency".
>>
>> Setting omitTermFreqAndPositions="true" on your field type will eliminate
>> the term frequency from the equation, and it will become a simple "match
>> or not" factor in your scoring.
>>
>> From the "more then one way to do it" standpoint, another option is to
>> wrap the query in a function that flattens the scores (more fine grained
>> control, and doesn't require re-indexing, but probably less efficient)
>>
>> q={!boost b=$cat_boost v=$main_query}
>> main_query=...
>> cat_boost={!func}map(map(query({!field f=cat
>> v=$cat},-1),0,10000,5)-1,-1,1)
>> cat=...
>>
>> (note: used nested maps so that non-matches would result in a 1x
>> multipler, while matches result in a 5x multiplier)
>>
>> -Hoss
>>
>
>
>
> --
> Regards,
> Samar
>



-- 
Regards,
Samar

Re: to prevent number-of-matching-terms in contributing score

Posted by Chris Hostetter <ho...@fucit.org>.

:  1. "omitTermFreqAndPositions" is very straightforward but if I avoid
: positions I'll refuse to serve phrase queries. I had searched for this in

but do you really need phrase queries on your "cat" field?  i thought the 
point was to have simple matching on those terms?

:  2. Function query seemed nice (though strange because I never used it
: before) and I gave it a few hours but that too did not seem to solve my
: requirement. The "artificial" score we are generating is getting multiplied
: into rest of the score which includes score due to "cat" field as well. (I
: can not remove "cat" from "qf" as I have to search there). It is only that
: I don't want this field's score on the basis of matching "tf".

I don't think i realized you were using dismax ... if you just want a 
match on "cat" to help determine if the document is a match, but not have 
*any* impact on score, you could just set the qf boost to 0 (ie: 
qf=title^10 cat^0) but i'm not sure if that's really what you want.

: After spending some hours on function queries I finally reached on
: following query

Honestly: i'm not really following what you tried there because of the 
formatting applied by your email client ... it seemed to be making tons of 
hyperlinks out of peices of the URL.

Looking at your query explanation however the problem seems to be that you 
are still using the relevancy score of the matches on the "cat" field, 
instead of *just* using hte function boost...

: But debugging the query showed that the boost value ($cat_boost) is being
: multiplied into a value which is generated with the help of "cat" field
: thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
: 
: 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
: (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:

...my point before was to take "cat:chair" out of the "main" part of your 
query, and *only* put it in the boost function.  if you are using dismax, 
the "qf=cat^0" suggestion mentioned above *combined* with your boost 
function will probably get you what you want (i think)

: I was thinking there should be some hook or plugin (or anything) which
: could just change the score calculation formula *for a particular field*.
: There is a function in DefaultSimilarity class - *public float tf(float
: freq)* but that does not mention the field name. Is there a possibility to
: look into this direction?

on trunk, there is a distinct Similarity object per fieldtype, so you 
could certain look at that -- but you are correct that in 3x there is no 
way to override the tf() function on a per field basis.


-Hoss

Re: to prevent number-of-matching-terms in contributing score

Posted by Samarendra Pratap <sa...@gmail.com>.

On Thu, Nov 17, 2011 at 6:59 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> :  1. "omitTermFreqAndPositions" is very straightforward but if I avoid
> : positions I'll refuse to serve phrase queries. I had searched for this in
>
> but do you really need phrase queries on your "cat" field?  i thought the
> point was to have simple matching on those terms?
>
> Yes I need to match phrases. Consider following documents
Doc1 - categories: "teak wooden chair", "bamboo wooden chair"
Doc2 - categories: "wooden chair"
Doc3 - categories: "plastic chair", "wooden cupboard".

A query "wooden chair" should give doc1 and doc2 with equal score (provided
other fields generate same score) and doc3 should be excluded. Non-phrase
match would include doc3 as well.


:  2. Function query seemed nice (though strange because I never used it
> : before) and I gave it a few hours but that too did not seem to solve my
> : requirement. The "artificial" score we are generating is getting
> multiplied
> : into rest of the score which includes score due to "cat" field as well.
> (I
> : can not remove "cat" from "qf" as I have to search there). It is only
> that
> : I don't want this field's score on the basis of matching "tf".
>
> I don't think i realized you were using dismax ... if you just want a
> match on "cat" to help determine if the document is a match, but not have
> *any* impact on score, you could just set the qf boost to 0 (ie:
> qf=title^10 cat^0) but i'm not sure if that's really what you want.
>
> Well this is almost what I want. (Thanks for telling me about ^0. I
learned a new thing.).
I wanted a constant score for a match in "cat" and I did not want the
frequency of match in "cat" to affect the score which can be done this way.
But I definitely want to generate some score, equal to single match (tf =
1) so that less important fields like "description" may not get higher
boost than "cat". Writing ^0 creates 0.00 score for a match in "cat" while
a match in "description" will generate some positive score greater than
zero (0).



> : After spending some hours on function queries I finally reached on
> : following query
>
> Honestly: i'm not really following what you tried there because of the
> formatting applied by your email client ... it seemed to be making tons of
> hyperlinks out of peices of the URL.
>
> Looking at your query explanation however the problem seems to be that you
> are still using the relevancy score of the matches on the "cat" field,
> instead of *just* using hte function boost...
>
> I did try *just* using the function boost, i.e. removed the "cat" from
"qf", but it did not seem to return documents which have matching
categories just in "cat" field. The query was something like following (i
hope it be clear this time)

<url>?q={!boost b=$cat_boost v=$main_query}
*&main_query={!dismax qf="title" v=$qry}*
&cat_boost={!func}map(query({!field f=cat v=$qry},-1),0,1000,5,1)
&qry=chair
...

(note: i slightly modified the cat_boost parameter to use only single map()
function with 5 argument form)

It gave me just two docs where "title" contained the query word (chair)

I also tried changing main_query like
*&main_query={!dismax qf="title cat" v=$qry}*
which gave me all 4 required docs but with scores varying on the basis of
"cat" as well

and
*&main_query={!dismax qf="title cat^0" v=$qry}*
which gave me all required docs with a constant (0.0) "cat" score. but when
I'll add "description" in qf, docs even with worst matching in
"description" will score higher than docs with a good match in "cat" which
is not exactly what is required.



> : But debugging the query showed that the boost value ($cat_boost) is being
> : multiplied into a value which is generated with the help of "cat" field
> : thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
> :
> : 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
> : (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:
>
> ...my point before was to take "cat:chair" out of the "main" part of your
> query, and *only* put it in the boost function.  if you are using dismax,
> the "qf=cat^0" suggestion mentioned above *combined* with your boost
> function will probably get you what you want (i think)
>
> taking "cat:chair" out of main_query (dismax equivalent - removing "cat"
from "qf") or using "cat^0" did not produce desired effect as I described
earlier


> : I was thinking there should be some hook or plugin (or anything) which
> : could just change the score calculation formula *for a particular field*.
> : There is a function in DefaultSimilarity class - *public float tf(float
> : freq)* but that does not mention the field name. Is there a possibility
> to
> : look into this direction?
>
> on trunk, there is a distinct Similarity object per fieldtype, so you
> could certain look at that -- but you are correct that in 3x there is no
> way to override the tf() function on a per field basis.
>
> I'll definitely look at the Similarity class. I hope there are no
performance degradation issues with it :)

>
> -Hoss
>

Thank you very much.

-- 
Regards,
Samar

Re: to prevent number-of-matching-terms in contributing score

Posted by Chris Hostetter <ho...@fucit.org>.

:  1. "omitTermFreqAndPositions" is very straightforward but if I avoid
: positions I'll refuse to serve phrase queries. I had searched for this in

but do you really need phrase queries on your "cat" field?  i thought the 
point was to have simple matching on those terms?

:  2. Function query seemed nice (though strange because I never used it
: before) and I gave it a few hours but that too did not seem to solve my
: requirement. The "artificial" score we are generating is getting multiplied
: into rest of the score which includes score due to "cat" field as well. (I
: can not remove "cat" from "qf" as I have to search there). It is only that
: I don't want this field's score on the basis of matching "tf".

I don't think i realized you were using dismax ... if you just want a 
match on "cat" to help determine if the document is a match, but not have 
*any* impact on score, you could just set the qf boost to 0 (ie: 
qf=title^10 cat^0) but i'm not sure if that's really what you want.

: After spending some hours on function queries I finally reached on
: following query

Honestly: i'm not really following what you tried there because of the 
formatting applied by your email client ... it seemed to be making tons of 
hyperlinks out of peices of the URL.

Looking at your query explanation however the problem seems to be that you 
are still using the relevancy score of the matches on the "cat" field, 
instead of *just* using hte function boost...

: But debugging the query showed that the boost value ($cat_boost) is being
: multiplied into a value which is generated with the help of "cat" field
: thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
: 
: 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
: (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:

...my point before was to take "cat:chair" out of the "main" part of your 
query, and *only* put it in the boost function.  if you are using dismax, 
the "qf=cat^0" suggestion mentioned above *combined* with your boost 
function will probably get you what you want (i think)

: I was thinking there should be some hook or plugin (or anything) which
: could just change the score calculation formula *for a particular field*.
: There is a function in DefaultSimilarity class - *public float tf(float
: freq)* but that does not mention the field name. Is there a possibility to
: look into this direction?

on trunk, there is a distinct Similarity object per fieldtype, so you 
could certain look at that -- but you are correct that in 3x there is no 
way to override the tf() function on a per field basis.


-Hoss

Re: to prevent number-of-matching-terms in contributing score

Posted by Samarendra Pratap <sa...@gmail.com>.

Hi Chris,
 Thanks for the insight.

 1. "omitTermFreqAndPositions" is very straightforward but if I avoid
positions I'll refuse to serve phrase queries. I had searched for this in
past as well but I finally reached to the conclusion that there is no thing
like "omitTermFreq" (only). Perhaps because frequency is the count of
positions of a term and we can not discard it if latter is present. :( .
Please point me out If I am wrong. And if I really am, that would be
exactly what I need.

 2. Function query seemed nice (though strange because I never used it
before) and I gave it a few hours but that too did not seem to solve my
requirement. The "artificial" score we are generating is getting multiplied
into rest of the score which includes score due to "cat" field as well. (I
can not remove "cat" from "qf" as I have to search there). It is only that
I don't want this field's score on the basis of matching "tf".


 To explain second point here is what I did.
 I indexed 4 documents
doc 1

tile:chair,
cat:chair and chair

doc 2

tile:table,
cat:chair and chair

doc 3

tile:chair,
cat:chair and table

doc 4

tile:table,
cat:chair and table


searching for a simple query
http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
q=*chair*&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=title&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=cat&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
fl=title,cat,id,score&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
pf=ttile<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>

gives 4 results (1,3,2,4)

I want document 1 and 3 with equal score and 2 and 4 with similar score.
because the only difference within the pairs is only "cat" field's value

After spending some hours on function queries I finally reached on
following query
http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
q={!boost%20b=$cat_boost%20v=$main_query}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
main_query={!dismax%20qf=%22title%20cat%22%20v=$qry}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
cat_boost={!func}map(query({!field%20f=cat%20v=$qry},-1),0,1000,1,0)&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qry=*chair*&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=title&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=cat&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
fl=title,cat,displayid,score&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
pf=ttile<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>


But debugging the query showed that the boost value ($cat_boost) is being
multiplied into a value which is generated with the help of "cat" field
thus resulting in different scores for 1 and 3 (similarly for 2 and 4).

1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
(),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:
  1.2942866 = (MATCH) sum of:
    1.2942866 = (MATCH) max plus 0.01 times others of:
      1.2876587 = (MATCH) weight(title:chair in 0), product of:
        0.9999818 = queryWeight(title:chair), product of:
          1.287682 = idf(docFreq=2, maxDocs=4)
          0.7765751 = queryNorm
        1.287682 = (MATCH) fieldWeight(title:chair in 0), product of:
          1.0 = tf(termFreq(title:chair)=1)
          1.287682 = idf(docFreq=2, maxDocs=4)
          1.0 = fieldNorm(field=title, doc=0)
      0.66279614 = (MATCH) weight(cat:chair in 0), product of:
        0.60328734 = queryWeight(cat:chair), product of:
          0.7768564 = idf(docFreq=4, maxDocs=4)
          0.7765751 = queryNorm
        1.0986409 = (MATCH) fieldWeight(cat:chair in 0), product of:
          1.4142135 = tf(termFreq(cat:chair)=2)
          0.7768564 = idf(docFreq=4, maxDocs=4)
          1.0 = fieldNorm(field=cat, doc=0)
 * 1.0* =
map(query(cat:chair,def=-1.0)=1.0986409,min=0.0,max=1000.0,target=1.0)




Did I get you wrong?
I'll appreciate if you could point out any mistake (or my
misinterpretation) in the mail above.


I was thinking there should be some hook or plugin (or anything) which
could just change the score calculation formula *for a particular field*.
There is a function in DefaultSimilarity class - *public float tf(float
freq)* but that does not mention the field name. Is there a possibility to
look into this direction?


Thank you very much.




On Tue, Nov 8, 2011 at 6:23 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : You can write your custom similarity implementation, and override the
> : /lengthNorm()/ method to return a constant value.
>
> The postered already said (twice!) that they have already set
> omitNorms=true, so lengthNorm won't even be used
>
> omiting norms (or mucking with norms by modifying the lengthNorm function)
> only affects the norms portion of the scoring -- the problem being
> described here is when a document matches the input term more then once:
> that is an issue of the "term freuency".
>
> Setting omitTermFreqAndPositions="true" on your field type will eliminate
> the term frequency from the equation, and it will become a simple "match
> or not" factor in your scoring.
>
> From the "more then one way to do it" standpoint, another option is to
> wrap the query in a function that flattens the scores (more fine grained
> control, and doesn't require re-indexing, but probably less efficient)
>
> q={!boost b=$cat_boost v=$main_query}
> main_query=...
> cat_boost={!func}map(map(query({!field f=cat v=$cat},-1),0,10000,5)-1,-1,1)
> cat=...
>
> (note: used nested maps so that non-matches would result in a 1x
> multipler, while matches result in a 5x multiplier)
>
> -Hoss
>



-- 
Regards,
Samar

Re: to prevent number-of-matching-terms in contributing score

Posted by Chris Hostetter <ho...@fucit.org>.

: You can write your custom similarity implementation, and override the
: /lengthNorm()/ method to return a constant value.

The postered already said (twice!) that they have already set 
omitNorms=true, so lengthNorm won't even be used 

omiting norms (or mucking with norms by modifying the lengthNorm function) 
only affects the norms portion of the scoring -- the problem being 
described here is when a document matches the input term more then once: 
that is an issue of the "term freuency".

Setting omitTermFreqAndPositions="true" on your field type will eliminate 
the term frequency from the equation, and it will become a simple "match 
or not" factor in your scoring.

>From the "more then one way to do it" standpoint, another option is to 
wrap the query in a function that flattens the scores (more fine grained 
control, and doesn't require re-indexing, but probably less efficient)

q={!boost b=$cat_boost v=$main_query}
main_query=...
cat_boost={!func}map(map(query({!field f=cat v=$cat},-1),0,10000,5)-1,-1,1)
cat=...

(note: used nested maps so that non-matches would result in a 1x 
multipler, while matches result in a 5x multiplier)

-Hoss

Re: to prevent number-of-matching-terms in contributing score

Posted by pravesh <su...@yahoo.com>.

Hi Samar,

You can write your custom similarity implementation, and override the
/lengthNorm()/ method to return a constant value.

Then in your /schema.xml/ specify your custom implementation as the default
similarity class.

But you need to rebuild your index from scratch for this to come into
effect(also set /omitNorms="true"/ for your fields where you need this
feature)

Regds
Pravesh

--
View this message in context: http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486512.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to prevent number-of-matching-terms in contributing score

Posted by Samarendra Pratap <sa...@gmail.com>.

Hi Pravesh, thanks for your reply but I am not asking about the "omitNorms"
(index time parameter) I am asking about how to consider multiple matches
of a term in a single field as "one" during query time.

Thanks

On Mon, Nov 7, 2011 at 2:48 PM, pravesh <su...@yahoo.com> wrote:

> Did you rebuild the index from scratch. Since this is index time factor,
> you
> need to build complete index from scratch.
>
> Regds
> Pravesh
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486447.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Regards,
Samar

Re: to prevent number-of-matching-terms in contributing score

Posted by pravesh <su...@yahoo.com>.

Did you rebuild the index from scratch. Since this is index time factor, you
need to build complete index from scratch.

Regds
Pravesh

--
View this message in context: http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486447.html
Sent from the Solr - User mailing list archive at Nabble.com.