You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Mark T. Trembley" <ma...@etrailer.com> on 2016/07/07 14:30:05 UTC

Boosting query results

I have a question about the best way to rank my results based on a score 
field that can have different values per document and where each 
document can have different scores based on which term is queried.

Essentially what I'm wanting to have happen is provide a list of terms 
that when matched via a query it returns a corresponding score to help 
boost the original document. So if I had a document with a multi-valued 
field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and 
my search query is "Boost2", I want that document's result to be boosted 
by 20. Also note that "Boost2" can boost different documents at 
different levels. The query to select the actual documents will select 
against other fields in the document and could possibly return documents 
with any combination of B1 terms.

I'm still trying to figure out how best to model this in my index, 
either as child documents, or in another collection, or if it would make 
more sense to figure out how to make it work via payloads or by boosting 
the terms at index time.

I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica 
of all collections.

The document structure I've been toying with the most is to put the 
boosts into a separate index and join them using !join syntax and 
returning the scores, but I've not had any luck getting quality results 
from those tests. The extra "scores" index is structured like this (I'll 
add the json for my test collections at the end of the email):
id:Document1_Boost1
   B1_s:Boost1
   B1_f:10
id:Document1_Boost3
   B1_s:Boost3
   B1_f:100
Using this structure, I get close, but the scores are not what I'm 
expecting. If I use the following query, the explain says it's using the 
score from Document6_Boost2 even though my query is specifying B1_s:Boost3
http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss 
fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true

<lstname="explain">
<strname="Document6">
*3.379996* = Score based on join value Document6_Boost2
</str>
<strname="Document1">
*2.2533307* = Score based on join value Document1_Boost1
</str>
<strname="Document7">
*0.24786638* = Score based on join value Document7_Boost333
</str>
<strname="Document3">*0.0* = Score based on join value 
Document3_NoBoost</str>
</lst>

My guess is that it's now doing an all document query on the "scores" 
collection to return the scores in addition to the B1_s query I've 
passed in. I can't figure out where it's getting those scores from as a 
simple query against the "scores" collection returns scores like I'd 
expect to see them based on a similar query:
http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND 
_val_:B1_f&fl=score,*&debugQuery=true

<lstname="explain">
<strname="Document1_Boost3">
*46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) 
[ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), 
product of: 0.8926926 = queryWeight, product of: 1.9808292 = 
idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight 
in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 
1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = 
FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = 
boost 0.45066613 = queryNorm
</str>
<strname="Document6_Boost3">
*15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) 
[ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), 
product of: 0.8926926 = queryWeight, product of: 1.9808292 = 
idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight 
in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 
1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = 
FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = 
boost 0.45066613 = queryNorm
</str>
</lst>

I feel like I'm getting close to what I need, but it's just not clear to 
me what I'm missing at this point.

The other option I've been toying with is using payloads, but actually 
utilizing the payloads as part of the scoring process is beyond me at 
this time.

Any thoughts or hints on the best way to boost the relevancy of these 
scoreswould be appreciated.
Thanks
Mark







GENERIC:
  {
     "id" : "Document1",
     "B1_ss" : ["Boost1|10","Boost3|100"],
     "title_s" : "Title1"
     ,"otherstuff_ss" : ["stuff1","suggestion"]
     ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
   },
   {
     "id" : "Document2",
     "B1_ss" : ["Boost2|20"],
     "name_s" : "Product2",
     "title_s" : "Title2"
     ,"otherstuff_ss" : ["stuff2","recommendation"]
     ,"B1_name_ss" : ["Document2_Boost1"]
   },
   {
     "id" : "Document3",
     "name_s" : "Product3",
     "B1_ss" : ["NoBoost"],
     "title_s" : "Title3"
     ,"otherstuff_ss" : ["stuff3","new","suggestion"]
     ,"B1_name_ss" : ["Document3_NoBoost"]
   },
    {
    "id" : "Document4",
     "name_s" : "Product4",
     "title_s" : "Title4"
     ,"otherstuff_ss" : ["stuff4","old","suggestion"]
   } ,
    {
    "id" : "Document5",
     "name_s" : "Product5",
     "title_s" : "Title5"
     ,"otherstuff_ss" : ["stuff5","recommendation"]
   },
    {
     "id" : "Document6",
     "name_s" : "Product6",
     "B1_ss" : ["Boost2|15","Boost3|30"],
     "title_s" : "Title6"
     ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
   },
    {
      "id" : "Document7",
     "name_s" : "Product7",
     "B1_ss" : ["NoBoost","Boost333|1.1"],
     "title_s" : "Title7"
     ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
   }

SCORES:
   {
     "id" : "Document1_Boost1",
     "B1_s" : "Boost1",
     "B1_f" : 10
   },
     {
     "id" : "Document1_Boost3",
     "B1_s" : "Boost3",
     "B1_f" : 100
   },
   {
     "id" : "Document2_Boost2",
     "B1_s" : "Boost2",
     "B1_f" : 20
   },
   {
     "id" : "Document3_NoBoost",
     "B1_s" : "NoBoost"
   },
   {
     "id" : "Document6_Boost2",
     "B1_s" : "Boost2",
     "B1_f" : 15
   },
   {
     "id" : "Document6_Boost3",
     "B1_s" : "Boost3",
     "B1_f" : 30
   },
   {
     "id" : "Document7_NoBoost",
     "B1_s" : "NoBoost"
   },
   {
     "id" : "Document7_Boost333",
     "B1_s" : "Boost333",
     "B1_f" : 1.1
   }

Re: Boosting query results

Posted by Walter Underwood <wu...@wunderwood.org>.

I think it works to join against the other collection to get scores. But I’m not sure. I think that was suggested for a fairly static collection of documents with rapidly changing scoring inputs.

Personally, I would try a straight popularity boost to see if it got you 80% of the way there.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 2:46 PM, Mark T. Trembley <ma...@etrailer.com> wrote:
> 
> Yes, the spam issue is something I'm aware of. I plan on having some sanity checks in place to make sure that the boosts are in line with expectations either at query time or while indexing the scores into Solr.
> 
> I just read through that document along with some of the more recent posts about signals, and it appears that I'm going down the same path as Lucidworks. I'm storing the aggregated search term and product id in an alternate index.  It seems that the piece that I'm missing is getting the boost per document. In the following post, it appears to me that Fusion is applying a boost to the main query by obtaining the scores from a set number of documents from the aggregate collection. I'm going to assume that part of it's query processing pipeline is to run a query on the aggregation collection to obtain the scores from that query and return them for use on the main query.
> 
> https://lucidworks.com/blog/2015/09/01/better-search-fusion-signals/
> 
> I think I could possibly hack something together on my side that mimics what I think is happening in Fusion, but with my tinkering, it seems to me that using a !join query (with scoring) like I've been trying could handle the job if I could only understand how the query executes on the joined collection and how I can pass a calculated score back to the main query for use in calculating a final score on the main collection.
> 
> 
> On 7/7/2016 1:34 PM, Walter Underwood wrote:
>> If it is running in an environment protected from spammers, you might want to start with the work that LucidWorks did on click scoring.
>> 
>> https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/ <https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/>
>> 
>> Of course, there are no environments free of spammers. I’ve seen them in enterprise search, too. But they are easier to deal with there. Call them up and tell them they need to stop immediately or their pages disappear from the search engine.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jul 7, 2016, at 11:29 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> 
>>> You understand that you are making your site extremely easy to spam, right? This is how Microsoft became the top hit for “evil empire” on Google.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>> 
>>>> I've found that it is definitely complicated!
>>>> 
>>>> Essentially what I am attempting to do is boost products based on the number of times that particular product has been selected via historical searches using the same search term or phrase.
>>>> 
>>>> 
>>>> On 7/7/2016 11:55 AM, Walter Underwood wrote:
>>>>> That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>> 
>>>>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>>> 
>>>>>> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
>>>>>> 
>>>>>> From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
>>>>>> 
>>>>>> {
>>>>>>  "id" : "Document2_Boost2",
>>>>>>  "B1_s" : "Boost2",
>>>>>>  "B1_f" : 20
>>>>>> }
>>>>>> {
>>>>>>  "id" : "Document6_Boost2",
>>>>>>  "B1_s" : "Boost2",
>>>>>>  "B1_f" : 15
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>>>>>> This looks like a job for “bq”, the boost query parameter. I used this to boost textbooks which were used at the student’s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>>>>>>> 
>>>>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>>>>>>> 
>>>>>>> Here is the relationship between fq, q, and bq:
>>>>>>> 
>>>>>>> fq: selection, does not affect ranking
>>>>>>> q: selection and ranking
>>>>>>> bq: does not affect selection, affects ranking
>>>>>>> 
>>>>>>> wunder
>>>>>>> Walter Underwood
>>>>>>> wunder@wunderwood.org
>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>>>>> 
>>>>>>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>>>>>>> 
>>>>>>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>>>>>>> 
>>>>>>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>>>>>>> 
>>>>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>>>>>>> 
>>>>>>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>>>>>>> id:Document1_Boost1
>>>>>>>> B1_s:Boost1
>>>>>>>> B1_f:10
>>>>>>>> id:Document1_Boost3
>>>>>>>> B1_s:Boost3
>>>>>>>> B1_f:100
>>>>>>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>>>>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>>>>> 
>>>>>>>> <lstname="explain">
>>>>>>>> <strname="Document6">
>>>>>>>> *3.379996* = Score based on join value Document6_Boost2
>>>>>>>> </str>
>>>>>>>> <strname="Document1">
>>>>>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>>>>>> </str>
>>>>>>>> <strname="Document7">
>>>>>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>>>>>> </str>
>>>>>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>>>>>> </lst>
>>>>>>>> 
>>>>>>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>>>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>>>>> 
>>>>>>>> <lstname="explain">
>>>>>>>> <strname="Document1_Boost3">
>>>>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>>>>>> </str>
>>>>>>>> <strname="Document6_Boost3">
>>>>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>>>>>> </str>
>>>>>>>> </lst>
>>>>>>>> 
>>>>>>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>>>>>>> 
>>>>>>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>>>>> 
>>>>>>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>>>>>>> Thanks
>>>>>>>> Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> GENERIC:
>>>>>>>> {
>>>>>>>>   "id" : "Document1",
>>>>>>>>   "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>>>>>   "title_s" : "Title1"
>>>>>>>>   ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>>>>>   ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document2",
>>>>>>>>   "B1_ss" : ["Boost2|20"],
>>>>>>>>   "name_s" : "Product2",
>>>>>>>>   "title_s" : "Title2"
>>>>>>>>   ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>>>>>   ,"B1_name_ss" : ["Document2_Boost1"]
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document3",
>>>>>>>>   "name_s" : "Product3",
>>>>>>>>   "B1_ss" : ["NoBoost"],
>>>>>>>>   "title_s" : "Title3"
>>>>>>>>   ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>>>>>   ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>>>>> },
>>>>>>>>  {
>>>>>>>>  "id" : "Document4",
>>>>>>>>   "name_s" : "Product4",
>>>>>>>>   "title_s" : "Title4"
>>>>>>>>   ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>>>>> } ,
>>>>>>>>  {
>>>>>>>>  "id" : "Document5",
>>>>>>>>   "name_s" : "Product5",
>>>>>>>>   "title_s" : "Title5"
>>>>>>>>   ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>>>>> },
>>>>>>>>  {
>>>>>>>>   "id" : "Document6",
>>>>>>>>   "name_s" : "Product6",
>>>>>>>>   "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>>>>>   "title_s" : "Title6"
>>>>>>>>   ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>>>>> },
>>>>>>>>  {
>>>>>>>>    "id" : "Document7",
>>>>>>>>   "name_s" : "Product7",
>>>>>>>>   "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>>>>>   "title_s" : "Title7"
>>>>>>>>   ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>>>>> }
>>>>>>>> 
>>>>>>>> SCORES:
>>>>>>>> {
>>>>>>>>   "id" : "Document1_Boost1",
>>>>>>>>   "B1_s" : "Boost1",
>>>>>>>>   "B1_f" : 10
>>>>>>>> },
>>>>>>>>   {
>>>>>>>>   "id" : "Document1_Boost3",
>>>>>>>>   "B1_s" : "Boost3",
>>>>>>>>   "B1_f" : 100
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document2_Boost2",
>>>>>>>>   "B1_s" : "Boost2",
>>>>>>>>   "B1_f" : 20
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document3_NoBoost",
>>>>>>>>   "B1_s" : "NoBoost"
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document6_Boost2",
>>>>>>>>   "B1_s" : "Boost2",
>>>>>>>>   "B1_f" : 15
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document6_Boost3",
>>>>>>>>   "B1_s" : "Boost3",
>>>>>>>>   "B1_f" : 30
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document7_NoBoost",
>>>>>>>>   "B1_s" : "NoBoost"
>>>>>>>> },
>>>>>>>> {
>>>>>>>>   "id" : "Document7_Boost333",
>>>>>>>>   "B1_s" : "Boost333",
>>>>>>>>   "B1_f" : 1.1
>>>>>>>> }
>>>>>>>> 
>> 
>

Re: Boosting query results

Posted by "Mark T. Trembley" <ma...@etrailer.com>.

Yes, the spam issue is something I'm aware of. I plan on having some 
sanity checks in place to make sure that the boosts are in line with 
expectations either at query time or while indexing the scores into Solr.

I just read through that document along with some of the more recent 
posts about signals, and it appears that I'm going down the same path as 
Lucidworks. I'm storing the aggregated search term and product id in an 
alternate index.  It seems that the piece that I'm missing is getting 
the boost per document. In the following post, it appears to me that 
Fusion is applying a boost to the main query by obtaining the scores 
from a set number of documents from the aggregate collection. I'm going 
to assume that part of it's query processing pipeline is to run a query 
on the aggregation collection to obtain the scores from that query and 
return them for use on the main query.

https://lucidworks.com/blog/2015/09/01/better-search-fusion-signals/

I think I could possibly hack something together on my side that mimics 
what I think is happening in Fusion, but with my tinkering, it seems to 
me that using a !join query (with scoring) like I've been trying could 
handle the job if I could only understand how the query executes on the 
joined collection and how I can pass a calculated score back to the main 
query for use in calculating a final score on the main collection.


On 7/7/2016 1:34 PM, Walter Underwood wrote:
> If it is running in an environment protected from spammers, you might want to start with the work that LucidWorks did on click scoring.
>
> https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/ <https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/>
>
> Of course, there are no environments free of spammers. I\u2019ve seen them in enterprise search, too. But they are easier to deal with there. Call them up and tell them they need to stop immediately or their pages disappear from the search engine.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Jul 7, 2016, at 11:29 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>>
>> You understand that you are making your site extremely easy to spam, right? This is how Microsoft became the top hit for \u201cevil empire\u201d on Google.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>
>>> I've found that it is definitely complicated!
>>>
>>> Essentially what I am attempting to do is boost products based on the number of times that particular product has been selected via historical searches using the same search term or phrase.
>>>
>>>
>>> On 7/7/2016 11:55 AM, Walter Underwood wrote:
>>>> That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.
>>>>
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>>
>>>>
>>>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>>
>>>>> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
>>>>>
>>>>>  From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
>>>>>
>>>>> {
>>>>>   "id" : "Document2_Boost2",
>>>>>   "B1_s" : "Boost2",
>>>>>   "B1_f" : 20
>>>>> }
>>>>> {
>>>>>   "id" : "Document6_Boost2",
>>>>>   "B1_s" : "Boost2",
>>>>>   "B1_f" : 15
>>>>> }
>>>>>
>>>>>
>>>>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>>>>> This looks like a job for \u201cbq\u201d, the boost query parameter. I used this to boost textbooks which were used at the student\u2019s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>>>>>>
>>>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>>>>>>
>>>>>> Here is the relationship between fq, q, and bq:
>>>>>>
>>>>>> fq: selection, does not affect ranking
>>>>>> q: selection and ranking
>>>>>> bq: does not affect selection, affects ranking
>>>>>>
>>>>>> wunder
>>>>>> Walter Underwood
>>>>>> wunder@wunderwood.org
>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>
>>>>>>
>>>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>>>>
>>>>>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>>>>>>
>>>>>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>>>>>>
>>>>>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>>>>>>
>>>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>>>>>>
>>>>>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>>>>>> id:Document1_Boost1
>>>>>>> B1_s:Boost1
>>>>>>> B1_f:10
>>>>>>> id:Document1_Boost3
>>>>>>> B1_s:Boost3
>>>>>>> B1_f:100
>>>>>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>>>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>>>>
>>>>>>> <lstname="explain">
>>>>>>> <strname="Document6">
>>>>>>> *3.379996* = Score based on join value Document6_Boost2
>>>>>>> </str>
>>>>>>> <strname="Document1">
>>>>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>>>>> </str>
>>>>>>> <strname="Document7">
>>>>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>>>>> </str>
>>>>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>>>>> </lst>
>>>>>>>
>>>>>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>>>>
>>>>>>> <lstname="explain">
>>>>>>> <strname="Document1_Boost3">
>>>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>>>>> </str>
>>>>>>> <strname="Document6_Boost3">
>>>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>>>>> </str>
>>>>>>> </lst>
>>>>>>>
>>>>>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>>>>>>
>>>>>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>>>>
>>>>>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>>>>>> Thanks
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> GENERIC:
>>>>>>> {
>>>>>>>    "id" : "Document1",
>>>>>>>    "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>>>>    "title_s" : "Title1"
>>>>>>>    ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>>>>    ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document2",
>>>>>>>    "B1_ss" : ["Boost2|20"],
>>>>>>>    "name_s" : "Product2",
>>>>>>>    "title_s" : "Title2"
>>>>>>>    ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>>>>    ,"B1_name_ss" : ["Document2_Boost1"]
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document3",
>>>>>>>    "name_s" : "Product3",
>>>>>>>    "B1_ss" : ["NoBoost"],
>>>>>>>    "title_s" : "Title3"
>>>>>>>    ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>>>>    ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>>>> },
>>>>>>>   {
>>>>>>>   "id" : "Document4",
>>>>>>>    "name_s" : "Product4",
>>>>>>>    "title_s" : "Title4"
>>>>>>>    ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>>>> } ,
>>>>>>>   {
>>>>>>>   "id" : "Document5",
>>>>>>>    "name_s" : "Product5",
>>>>>>>    "title_s" : "Title5"
>>>>>>>    ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>>>> },
>>>>>>>   {
>>>>>>>    "id" : "Document6",
>>>>>>>    "name_s" : "Product6",
>>>>>>>    "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>>>>    "title_s" : "Title6"
>>>>>>>    ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>>>> },
>>>>>>>   {
>>>>>>>     "id" : "Document7",
>>>>>>>    "name_s" : "Product7",
>>>>>>>    "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>>>>    "title_s" : "Title7"
>>>>>>>    ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>>>> }
>>>>>>>
>>>>>>> SCORES:
>>>>>>> {
>>>>>>>    "id" : "Document1_Boost1",
>>>>>>>    "B1_s" : "Boost1",
>>>>>>>    "B1_f" : 10
>>>>>>> },
>>>>>>>    {
>>>>>>>    "id" : "Document1_Boost3",
>>>>>>>    "B1_s" : "Boost3",
>>>>>>>    "B1_f" : 100
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document2_Boost2",
>>>>>>>    "B1_s" : "Boost2",
>>>>>>>    "B1_f" : 20
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document3_NoBoost",
>>>>>>>    "B1_s" : "NoBoost"
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document6_Boost2",
>>>>>>>    "B1_s" : "Boost2",
>>>>>>>    "B1_f" : 15
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document6_Boost3",
>>>>>>>    "B1_s" : "Boost3",
>>>>>>>    "B1_f" : 30
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document7_NoBoost",
>>>>>>>    "B1_s" : "NoBoost"
>>>>>>> },
>>>>>>> {
>>>>>>>    "id" : "Document7_Boost333",
>>>>>>>    "B1_s" : "Boost333",
>>>>>>>    "B1_f" : 1.1
>>>>>>> }
>>>>>>>
>

Re: Boosting query results

Posted by Walter Underwood <wu...@wunderwood.org>.

If it is running in an environment protected from spammers, you might want to start with the work that LucidWorks did on click scoring.

https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/ <https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/>

Of course, there are no environments free of spammers. I’ve seen them in enterprise search, too. But they are easier to deal with there. Call them up and tell them they need to stop immediately or their pages disappear from the search engine.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 11:29 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> You understand that you are making your site extremely easy to spam, right? This is how Microsoft became the top hit for “evil empire” on Google.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>> 
>> I've found that it is definitely complicated!
>> 
>> Essentially what I am attempting to do is boost products based on the number of times that particular product has been selected via historical searches using the same search term or phrase.
>> 
>> 
>> On 7/7/2016 11:55 AM, Walter Underwood wrote:
>>> That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>> 
>>>> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
>>>> 
>>>> From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
>>>> 
>>>> {
>>>>  "id" : "Document2_Boost2",
>>>>  "B1_s" : "Boost2",
>>>>  "B1_f" : 20
>>>> }
>>>> {
>>>>  "id" : "Document6_Boost2",
>>>>  "B1_s" : "Boost2",
>>>>  "B1_f" : 15
>>>> }
>>>> 
>>>> 
>>>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>>>> This looks like a job for “bq”, the boost query parameter. I used this to boost textbooks which were used at the student’s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>>>>> 
>>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>>>>> 
>>>>> Here is the relationship between fq, q, and bq:
>>>>> 
>>>>> fq: selection, does not affect ranking
>>>>> q: selection and ranking
>>>>> bq: does not affect selection, affects ranking
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>> 
>>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>>> 
>>>>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>>>>> 
>>>>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>>>>> 
>>>>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>>>>> 
>>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>>>>> 
>>>>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>>>>> id:Document1_Boost1
>>>>>> B1_s:Boost1
>>>>>> B1_f:10
>>>>>> id:Document1_Boost3
>>>>>> B1_s:Boost3
>>>>>> B1_f:100
>>>>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>>> 
>>>>>> <lstname="explain">
>>>>>> <strname="Document6">
>>>>>> *3.379996* = Score based on join value Document6_Boost2
>>>>>> </str>
>>>>>> <strname="Document1">
>>>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>>>> </str>
>>>>>> <strname="Document7">
>>>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>>>> </str>
>>>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>>>> </lst>
>>>>>> 
>>>>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>>> 
>>>>>> <lstname="explain">
>>>>>> <strname="Document1_Boost3">
>>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>>>> </str>
>>>>>> <strname="Document6_Boost3">
>>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>>>> </str>
>>>>>> </lst>
>>>>>> 
>>>>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>>>>> 
>>>>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>>> 
>>>>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>>>>> Thanks
>>>>>> Mark
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> GENERIC:
>>>>>> {
>>>>>>   "id" : "Document1",
>>>>>>   "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>>>   "title_s" : "Title1"
>>>>>>   ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>>>   ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document2",
>>>>>>   "B1_ss" : ["Boost2|20"],
>>>>>>   "name_s" : "Product2",
>>>>>>   "title_s" : "Title2"
>>>>>>   ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>>>   ,"B1_name_ss" : ["Document2_Boost1"]
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document3",
>>>>>>   "name_s" : "Product3",
>>>>>>   "B1_ss" : ["NoBoost"],
>>>>>>   "title_s" : "Title3"
>>>>>>   ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>>>   ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>>> },
>>>>>>  {
>>>>>>  "id" : "Document4",
>>>>>>   "name_s" : "Product4",
>>>>>>   "title_s" : "Title4"
>>>>>>   ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>>> } ,
>>>>>>  {
>>>>>>  "id" : "Document5",
>>>>>>   "name_s" : "Product5",
>>>>>>   "title_s" : "Title5"
>>>>>>   ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>>> },
>>>>>>  {
>>>>>>   "id" : "Document6",
>>>>>>   "name_s" : "Product6",
>>>>>>   "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>>>   "title_s" : "Title6"
>>>>>>   ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>>> },
>>>>>>  {
>>>>>>    "id" : "Document7",
>>>>>>   "name_s" : "Product7",
>>>>>>   "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>>>   "title_s" : "Title7"
>>>>>>   ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>>> }
>>>>>> 
>>>>>> SCORES:
>>>>>> {
>>>>>>   "id" : "Document1_Boost1",
>>>>>>   "B1_s" : "Boost1",
>>>>>>   "B1_f" : 10
>>>>>> },
>>>>>>   {
>>>>>>   "id" : "Document1_Boost3",
>>>>>>   "B1_s" : "Boost3",
>>>>>>   "B1_f" : 100
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document2_Boost2",
>>>>>>   "B1_s" : "Boost2",
>>>>>>   "B1_f" : 20
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document3_NoBoost",
>>>>>>   "B1_s" : "NoBoost"
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document6_Boost2",
>>>>>>   "B1_s" : "Boost2",
>>>>>>   "B1_f" : 15
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document6_Boost3",
>>>>>>   "B1_s" : "Boost3",
>>>>>>   "B1_f" : 30
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document7_NoBoost",
>>>>>>   "B1_s" : "NoBoost"
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document7_Boost333",
>>>>>>   "B1_s" : "Boost333",
>>>>>>   "B1_f" : 1.1
>>>>>> }
>>>>>> 
>>> 
>> 
>

Re: Boosting query results

Posted by Walter Underwood <wu...@wunderwood.org>.

You understand that you are making your site extremely easy to spam, right? This is how Microsoft became the top hit for “evil empire” on Google.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
> 
> I've found that it is definitely complicated!
> 
> Essentially what I am attempting to do is boost products based on the number of times that particular product has been selected via historical searches using the same search term or phrase.
> 
> 
> On 7/7/2016 11:55 AM, Walter Underwood wrote:
>> That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>> 
>>> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
>>> 
>>> From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
>>> 
>>> {
>>>   "id" : "Document2_Boost2",
>>>   "B1_s" : "Boost2",
>>>   "B1_f" : 20
>>> }
>>> {
>>>   "id" : "Document6_Boost2",
>>>   "B1_s" : "Boost2",
>>>   "B1_f" : 15
>>> }
>>> 
>>> 
>>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>>> This looks like a job for “bq”, the boost query parameter. I used this to boost textbooks which were used at the student’s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>>>> 
>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>>>> 
>>>> Here is the relationship between fq, q, and bq:
>>>> 
>>>> fq: selection, does not affect ranking
>>>> q: selection and ranking
>>>> bq: does not affect selection, affects ranking
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>> 
>>>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>>>> 
>>>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>>>> 
>>>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>>>> 
>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>>>> 
>>>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>>>> id:Document1_Boost1
>>>>>  B1_s:Boost1
>>>>>  B1_f:10
>>>>> id:Document1_Boost3
>>>>>  B1_s:Boost3
>>>>>  B1_f:100
>>>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>> 
>>>>> <lstname="explain">
>>>>> <strname="Document6">
>>>>> *3.379996* = Score based on join value Document6_Boost2
>>>>> </str>
>>>>> <strname="Document1">
>>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>>> </str>
>>>>> <strname="Document7">
>>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>>> </str>
>>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>>> </lst>
>>>>> 
>>>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>> 
>>>>> <lstname="explain">
>>>>> <strname="Document1_Boost3">
>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>>> </str>
>>>>> <strname="Document6_Boost3">
>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>>> </str>
>>>>> </lst>
>>>>> 
>>>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>>>> 
>>>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>> 
>>>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>>>> Thanks
>>>>> Mark
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> GENERIC:
>>>>> {
>>>>>    "id" : "Document1",
>>>>>    "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>>    "title_s" : "Title1"
>>>>>    ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>>    ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document2",
>>>>>    "B1_ss" : ["Boost2|20"],
>>>>>    "name_s" : "Product2",
>>>>>    "title_s" : "Title2"
>>>>>    ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>>    ,"B1_name_ss" : ["Document2_Boost1"]
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document3",
>>>>>    "name_s" : "Product3",
>>>>>    "B1_ss" : ["NoBoost"],
>>>>>    "title_s" : "Title3"
>>>>>    ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>>    ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>>  },
>>>>>   {
>>>>>   "id" : "Document4",
>>>>>    "name_s" : "Product4",
>>>>>    "title_s" : "Title4"
>>>>>    ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>>  } ,
>>>>>   {
>>>>>   "id" : "Document5",
>>>>>    "name_s" : "Product5",
>>>>>    "title_s" : "Title5"
>>>>>    ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>>  },
>>>>>   {
>>>>>    "id" : "Document6",
>>>>>    "name_s" : "Product6",
>>>>>    "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>>    "title_s" : "Title6"
>>>>>    ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>>  },
>>>>>   {
>>>>>     "id" : "Document7",
>>>>>    "name_s" : "Product7",
>>>>>    "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>>    "title_s" : "Title7"
>>>>>    ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>>  }
>>>>> 
>>>>> SCORES:
>>>>>  {
>>>>>    "id" : "Document1_Boost1",
>>>>>    "B1_s" : "Boost1",
>>>>>    "B1_f" : 10
>>>>>  },
>>>>>    {
>>>>>    "id" : "Document1_Boost3",
>>>>>    "B1_s" : "Boost3",
>>>>>    "B1_f" : 100
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document2_Boost2",
>>>>>    "B1_s" : "Boost2",
>>>>>    "B1_f" : 20
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document3_NoBoost",
>>>>>    "B1_s" : "NoBoost"
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document6_Boost2",
>>>>>    "B1_s" : "Boost2",
>>>>>    "B1_f" : 15
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document6_Boost3",
>>>>>    "B1_s" : "Boost3",
>>>>>    "B1_f" : 30
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document7_NoBoost",
>>>>>    "B1_s" : "NoBoost"
>>>>>  },
>>>>>  {
>>>>>    "id" : "Document7_Boost333",
>>>>>    "B1_s" : "Boost333",
>>>>>    "B1_f" : 1.1
>>>>>  }
>>>>> 
>> 
>

Re: Boosting query results

Posted by "Mark T. Trembley" <ma...@etrailer.com>.

I've found that it is definitely complicated!

Essentially what I am attempting to do is boost products based on the 
number of times that particular product has been selected via historical 
searches using the same search term or phrase.


On 7/7/2016 11:55 AM, Walter Underwood wrote:
> That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>
>> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
>>
>>  From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
>>
>> {
>>    "id" : "Document2_Boost2",
>>    "B1_s" : "Boost2",
>>    "B1_f" : 20
>> }
>> {
>>    "id" : "Document6_Boost2",
>>    "B1_s" : "Boost2",
>>    "B1_f" : 15
>> }
>>
>>
>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>> This looks like a job for \u201cbq\u201d, the boost query parameter. I used this to boost textbooks which were used at the student\u2019s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>>>
>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>>>
>>> Here is the relationship between fq, q, and bq:
>>>
>>> fq: selection, does not affect ranking
>>> q: selection and ranking
>>> bq: does not affect selection, affects ranking
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>>>
>>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>>>
>>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>>>
>>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>>>
>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>>>
>>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>>> id:Document1_Boost1
>>>>   B1_s:Boost1
>>>>   B1_f:10
>>>> id:Document1_Boost3
>>>>   B1_s:Boost3
>>>>   B1_f:100
>>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>
>>>> <lstname="explain">
>>>> <strname="Document6">
>>>> *3.379996* = Score based on join value Document6_Boost2
>>>> </str>
>>>> <strname="Document1">
>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>> </str>
>>>> <strname="Document7">
>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>> </str>
>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>> </lst>
>>>>
>>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>
>>>> <lstname="explain">
>>>> <strname="Document1_Boost3">
>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>> </str>
>>>> <strname="Document6_Boost3">
>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>> </str>
>>>> </lst>
>>>>
>>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>>>
>>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>
>>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>>> Thanks
>>>> Mark
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> GENERIC:
>>>> {
>>>>     "id" : "Document1",
>>>>     "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>     "title_s" : "Title1"
>>>>     ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>     ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>   },
>>>>   {
>>>>     "id" : "Document2",
>>>>     "B1_ss" : ["Boost2|20"],
>>>>     "name_s" : "Product2",
>>>>     "title_s" : "Title2"
>>>>     ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>     ,"B1_name_ss" : ["Document2_Boost1"]
>>>>   },
>>>>   {
>>>>     "id" : "Document3",
>>>>     "name_s" : "Product3",
>>>>     "B1_ss" : ["NoBoost"],
>>>>     "title_s" : "Title3"
>>>>     ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>     ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>   },
>>>>    {
>>>>    "id" : "Document4",
>>>>     "name_s" : "Product4",
>>>>     "title_s" : "Title4"
>>>>     ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>   } ,
>>>>    {
>>>>    "id" : "Document5",
>>>>     "name_s" : "Product5",
>>>>     "title_s" : "Title5"
>>>>     ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>   },
>>>>    {
>>>>     "id" : "Document6",
>>>>     "name_s" : "Product6",
>>>>     "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>     "title_s" : "Title6"
>>>>     ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>   },
>>>>    {
>>>>      "id" : "Document7",
>>>>     "name_s" : "Product7",
>>>>     "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>     "title_s" : "Title7"
>>>>     ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>   }
>>>>
>>>> SCORES:
>>>>   {
>>>>     "id" : "Document1_Boost1",
>>>>     "B1_s" : "Boost1",
>>>>     "B1_f" : 10
>>>>   },
>>>>     {
>>>>     "id" : "Document1_Boost3",
>>>>     "B1_s" : "Boost3",
>>>>     "B1_f" : 100
>>>>   },
>>>>   {
>>>>     "id" : "Document2_Boost2",
>>>>     "B1_s" : "Boost2",
>>>>     "B1_f" : 20
>>>>   },
>>>>   {
>>>>     "id" : "Document3_NoBoost",
>>>>     "B1_s" : "NoBoost"
>>>>   },
>>>>   {
>>>>     "id" : "Document6_Boost2",
>>>>     "B1_s" : "Boost2",
>>>>     "B1_f" : 15
>>>>   },
>>>>   {
>>>>     "id" : "Document6_Boost3",
>>>>     "B1_s" : "Boost3",
>>>>     "B1_f" : 30
>>>>   },
>>>>   {
>>>>     "id" : "Document7_NoBoost",
>>>>     "B1_s" : "NoBoost"
>>>>   },
>>>>   {
>>>>     "id" : "Document7_Boost333",
>>>>     "B1_s" : "Boost333",
>>>>     "B1_f" : 1.1
>>>>   }
>>>>
>

Re: Boosting query results

Posted by Walter Underwood <wu...@wunderwood.org>.

That is a very complicated design. What are you trying to achieve? Maybe there is a different approach that is simpler.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
> 
> That works with static boosts based on documents matching the query "Boost2". I want to apply a different boost to documents based on the value assigned to Boost2 within the document.
> 
> From my sample documents, when running a query with "Boost2," I want Document2 boosted by 20.0 and Document6 boosted by 15.0:
> 
> {
>   "id" : "Document2_Boost2",
>   "B1_s" : "Boost2",
>   "B1_f" : 20
> }
> {
>   "id" : "Document6_Boost2",
>   "B1_s" : "Boost2",
>   "B1_f" : 15
> }
> 
> 
> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>> This looks like a job for “bq”, the boost query parameter. I used this to boost textbooks which were used at the student’s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>> 
>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>> 
>> Here is the relationship between fq, q, and bq:
>> 
>> fq: selection, does not affect ranking
>> q: selection and ranking
>> bq: does not affect selection, affects ranking
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>> 
>>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>> 
>>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>> 
>>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>> 
>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>> 
>>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>>> id:Document1_Boost1
>>>  B1_s:Boost1
>>>  B1_f:10
>>> id:Document1_Boost3
>>>  B1_s:Boost3
>>>  B1_f:100
>>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>> 
>>> <lstname="explain">
>>> <strname="Document6">
>>> *3.379996* = Score based on join value Document6_Boost2
>>> </str>
>>> <strname="Document1">
>>> *2.2533307* = Score based on join value Document1_Boost1
>>> </str>
>>> <strname="Document7">
>>> *0.24786638* = Score based on join value Document7_Boost333
>>> </str>
>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>> </lst>
>>> 
>>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>> 
>>> <lstname="explain">
>>> <strname="Document1_Boost3">
>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>> </str>
>>> <strname="Document6_Boost3">
>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>> </str>
>>> </lst>
>>> 
>>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>> 
>>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>> 
>>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>>> Thanks
>>> Mark
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> GENERIC:
>>> {
>>>    "id" : "Document1",
>>>    "B1_ss" : ["Boost1|10","Boost3|100"],
>>>    "title_s" : "Title1"
>>>    ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>    ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>  },
>>>  {
>>>    "id" : "Document2",
>>>    "B1_ss" : ["Boost2|20"],
>>>    "name_s" : "Product2",
>>>    "title_s" : "Title2"
>>>    ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>    ,"B1_name_ss" : ["Document2_Boost1"]
>>>  },
>>>  {
>>>    "id" : "Document3",
>>>    "name_s" : "Product3",
>>>    "B1_ss" : ["NoBoost"],
>>>    "title_s" : "Title3"
>>>    ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>    ,"B1_name_ss" : ["Document3_NoBoost"]
>>>  },
>>>   {
>>>   "id" : "Document4",
>>>    "name_s" : "Product4",
>>>    "title_s" : "Title4"
>>>    ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>  } ,
>>>   {
>>>   "id" : "Document5",
>>>    "name_s" : "Product5",
>>>    "title_s" : "Title5"
>>>    ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>  },
>>>   {
>>>    "id" : "Document6",
>>>    "name_s" : "Product6",
>>>    "B1_ss" : ["Boost2|15","Boost3|30"],
>>>    "title_s" : "Title6"
>>>    ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>  },
>>>   {
>>>     "id" : "Document7",
>>>    "name_s" : "Product7",
>>>    "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>    "title_s" : "Title7"
>>>    ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>  }
>>> 
>>> SCORES:
>>>  {
>>>    "id" : "Document1_Boost1",
>>>    "B1_s" : "Boost1",
>>>    "B1_f" : 10
>>>  },
>>>    {
>>>    "id" : "Document1_Boost3",
>>>    "B1_s" : "Boost3",
>>>    "B1_f" : 100
>>>  },
>>>  {
>>>    "id" : "Document2_Boost2",
>>>    "B1_s" : "Boost2",
>>>    "B1_f" : 20
>>>  },
>>>  {
>>>    "id" : "Document3_NoBoost",
>>>    "B1_s" : "NoBoost"
>>>  },
>>>  {
>>>    "id" : "Document6_Boost2",
>>>    "B1_s" : "Boost2",
>>>    "B1_f" : 15
>>>  },
>>>  {
>>>    "id" : "Document6_Boost3",
>>>    "B1_s" : "Boost3",
>>>    "B1_f" : 30
>>>  },
>>>  {
>>>    "id" : "Document7_NoBoost",
>>>    "B1_s" : "NoBoost"
>>>  },
>>>  {
>>>    "id" : "Document7_Boost333",
>>>    "B1_s" : "Boost333",
>>>    "B1_f" : 1.1
>>>  }
>>> 
>> 
>

Re: Boosting query results

Posted by "Mark T. Trembley" <ma...@etrailer.com>.

That works with static boosts based on documents matching the query 
"Boost2". I want to apply a different boost to documents based on the 
value assigned to Boost2 within the document.

 From my sample documents, when running a query with "Boost2," I want 
Document2 boosted by 20.0 and Document6 boosted by 15.0:

  {
    "id" : "Document2_Boost2",
    "B1_s" : "Boost2",
    "B1_f" : 20
  }
  {
    "id" : "Document6_Boost2",
    "B1_s" : "Boost2",
    "B1_f" : 15
  }


On 7/7/2016 10:21 AM, Walter Underwood wrote:
> This looks like a job for \u201cbq\u201d, the boost query parameter. I used this to boost textbooks which were used at the student\u2019s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.
>
> bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10
>
> Here is the relationship between fq, q, and bq:
>
> fq: selection, does not affect ranking
> q: selection and ranking
> bq: does not affect selection, affects ranking
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
>>
>> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
>>
>> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
>>
>> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
>>
>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
>>
>> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
>> id:Document1_Boost1
>>   B1_s:Boost1
>>   B1_f:10
>> id:Document1_Boost3
>>   B1_s:Boost3
>>   B1_f:100
>> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>
>> <lstname="explain">
>> <strname="Document6">
>> *3.379996* = Score based on join value Document6_Boost2
>> </str>
>> <strname="Document1">
>> *2.2533307* = Score based on join value Document1_Boost1
>> </str>
>> <strname="Document7">
>> *0.24786638* = Score based on join value Document7_Boost333
>> </str>
>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>> </lst>
>>
>> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>
>> <lstname="explain">
>> <strname="Document1_Boost3">
>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>> </str>
>> <strname="Document6_Boost3">
>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>> </str>
>> </lst>
>>
>> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
>>
>> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
>>
>> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
>> Thanks
>> Mark
>>
>>
>>
>>
>>
>>
>>
>> GENERIC:
>> {
>>     "id" : "Document1",
>>     "B1_ss" : ["Boost1|10","Boost3|100"],
>>     "title_s" : "Title1"
>>     ,"otherstuff_ss" : ["stuff1","suggestion"]
>>     ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>   },
>>   {
>>     "id" : "Document2",
>>     "B1_ss" : ["Boost2|20"],
>>     "name_s" : "Product2",
>>     "title_s" : "Title2"
>>     ,"otherstuff_ss" : ["stuff2","recommendation"]
>>     ,"B1_name_ss" : ["Document2_Boost1"]
>>   },
>>   {
>>     "id" : "Document3",
>>     "name_s" : "Product3",
>>     "B1_ss" : ["NoBoost"],
>>     "title_s" : "Title3"
>>     ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>     ,"B1_name_ss" : ["Document3_NoBoost"]
>>   },
>>    {
>>    "id" : "Document4",
>>     "name_s" : "Product4",
>>     "title_s" : "Title4"
>>     ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>   } ,
>>    {
>>    "id" : "Document5",
>>     "name_s" : "Product5",
>>     "title_s" : "Title5"
>>     ,"otherstuff_ss" : ["stuff5","recommendation"]
>>   },
>>    {
>>     "id" : "Document6",
>>     "name_s" : "Product6",
>>     "B1_ss" : ["Boost2|15","Boost3|30"],
>>     "title_s" : "Title6"
>>     ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>   },
>>    {
>>      "id" : "Document7",
>>     "name_s" : "Product7",
>>     "B1_ss" : ["NoBoost","Boost333|1.1"],
>>     "title_s" : "Title7"
>>     ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>   }
>>
>> SCORES:
>>   {
>>     "id" : "Document1_Boost1",
>>     "B1_s" : "Boost1",
>>     "B1_f" : 10
>>   },
>>     {
>>     "id" : "Document1_Boost3",
>>     "B1_s" : "Boost3",
>>     "B1_f" : 100
>>   },
>>   {
>>     "id" : "Document2_Boost2",
>>     "B1_s" : "Boost2",
>>     "B1_f" : 20
>>   },
>>   {
>>     "id" : "Document3_NoBoost",
>>     "B1_s" : "NoBoost"
>>   },
>>   {
>>     "id" : "Document6_Boost2",
>>     "B1_s" : "Boost2",
>>     "B1_f" : 15
>>   },
>>   {
>>     "id" : "Document6_Boost3",
>>     "B1_s" : "Boost3",
>>     "B1_f" : 30
>>   },
>>   {
>>     "id" : "Document7_NoBoost",
>>     "B1_s" : "NoBoost"
>>   },
>>   {
>>     "id" : "Document7_Boost333",
>>     "B1_s" : "Boost333",
>>     "B1_f" : 1.1
>>   }
>>
>

Re: Boosting query results

Posted by Walter Underwood <wu...@wunderwood.org>.

This looks like a job for “bq”, the boost query parameter. I used this to boost textbooks which were used at the student’s school. bq does not force documents to be included in the result set. It does affect the ranking of the included documents.

bq=B1_ss:Boost2 will boost documents that match that. You can use weights, like bq=B1_ss:Boost2^10

Here is the relationship between fq, q, and bq:

fq: selection, does not affect ranking
q: selection and ranking
bq: does not affect selection, affects ranking

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <ma...@etrailer.com> wrote:
> 
> I have a question about the best way to rank my results based on a score field that can have different values per document and where each document can have different scores based on which term is queried.
> 
> Essentially what I'm wanting to have happen is provide a list of terms that when matched via a query it returns a corresponding score to help boost the original document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to be boosted by 20. Also note that "Boost2" can boost different documents at different levels. The query to select the actual documents will select against other fields in the document and could possibly return documents with any combination of B1 terms.
> 
> I'm still trying to figure out how best to model this in my index, either as child documents, or in another collection, or if it would make more sense to figure out how to make it work via payloads or by boosting the terms at index time.
> 
> I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all collections.
> 
> The document structure I've been toying with the most is to put the boosts into a separate index and join them using !join syntax and returning the scores, but I've not had any luck getting quality results from those tests. The extra "scores" index is structured like this (I'll add the json for my test collections at the end of the email):
> id:Document1_Boost1
>  B1_s:Boost1
>  B1_f:10
> id:Document1_Boost3
>  B1_s:Boost3
>  B1_f:100
> Using this structure, I get close, but the scores are not what I'm expecting. If I use the following query, the explain says it's using the score from Document6_Boost2 even though my query is specifying B1_s:Boost3
> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
> 
> <lstname="explain">
> <strname="Document6">
> *3.379996* = Score based on join value Document6_Boost2
> </str>
> <strname="Document1">
> *2.2533307* = Score based on join value Document1_Boost1
> </str>
> <strname="Document7">
> *0.24786638* = Score based on join value Document7_Boost333
> </str>
> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
> </lst>
> 
> My guess is that it's now doing an all document query on the "scores" collection to return the scores in addition to the B1_s query I've passed in. I can't figure out where it's getting those scores from as a simple query against the "scores" collection returns scores like I'd expect to see them based on a similar query:
> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
> 
> <lstname="explain">
> <strname="Document1_Boost3">
> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
> </str>
> <strname="Document6_Boost3">
> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
> </str>
> </lst>
> 
> I feel like I'm getting close to what I need, but it's just not clear to me what I'm missing at this point.
> 
> The other option I've been toying with is using payloads, but actually utilizing the payloads as part of the scoring process is beyond me at this time.
> 
> Any thoughts or hints on the best way to boost the relevancy of these scoreswould be appreciated.
> Thanks
> Mark
> 
> 
> 
> 
> 
> 
> 
> GENERIC:
> {
>    "id" : "Document1",
>    "B1_ss" : ["Boost1|10","Boost3|100"],
>    "title_s" : "Title1"
>    ,"otherstuff_ss" : ["stuff1","suggestion"]
>    ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>  },
>  {
>    "id" : "Document2",
>    "B1_ss" : ["Boost2|20"],
>    "name_s" : "Product2",
>    "title_s" : "Title2"
>    ,"otherstuff_ss" : ["stuff2","recommendation"]
>    ,"B1_name_ss" : ["Document2_Boost1"]
>  },
>  {
>    "id" : "Document3",
>    "name_s" : "Product3",
>    "B1_ss" : ["NoBoost"],
>    "title_s" : "Title3"
>    ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>    ,"B1_name_ss" : ["Document3_NoBoost"]
>  },
>   {
>   "id" : "Document4",
>    "name_s" : "Product4",
>    "title_s" : "Title4"
>    ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>  } ,
>   {
>   "id" : "Document5",
>    "name_s" : "Product5",
>    "title_s" : "Title5"
>    ,"otherstuff_ss" : ["stuff5","recommendation"]
>  },
>   {
>    "id" : "Document6",
>    "name_s" : "Product6",
>    "B1_ss" : ["Boost2|15","Boost3|30"],
>    "title_s" : "Title6"
>    ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>  },
>   {
>     "id" : "Document7",
>    "name_s" : "Product7",
>    "B1_ss" : ["NoBoost","Boost333|1.1"],
>    "title_s" : "Title7"
>    ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>  }
> 
> SCORES:
>  {
>    "id" : "Document1_Boost1",
>    "B1_s" : "Boost1",
>    "B1_f" : 10
>  },
>    {
>    "id" : "Document1_Boost3",
>    "B1_s" : "Boost3",
>    "B1_f" : 100
>  },
>  {
>    "id" : "Document2_Boost2",
>    "B1_s" : "Boost2",
>    "B1_f" : 20
>  },
>  {
>    "id" : "Document3_NoBoost",
>    "B1_s" : "NoBoost"
>  },
>  {
>    "id" : "Document6_Boost2",
>    "B1_s" : "Boost2",
>    "B1_f" : 15
>  },
>  {
>    "id" : "Document6_Boost3",
>    "B1_s" : "Boost3",
>    "B1_f" : 30
>  },
>  {
>    "id" : "Document7_NoBoost",
>    "B1_s" : "NoBoost"
>  },
>  {
>    "id" : "Document7_Boost333",
>    "B1_s" : "Boost333",
>    "B1_f" : 1.1
>  }
>