You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Derek Poh <dp...@globalsources.com.INVALID> on 2020/12/07 08:51:57 UTC

optimize boosting parameters

Hi

I have added the following boosting requirements to the search query of 
a page. Feedback from monitoring team is that the overall response of 
the page has increased since then.
I am trying to find out if the added boosting parameters (below) could 
have contributed to the increased.

The boosting is working as per requirements.

May I know if the implemented boosting parameters can be enhanced or 
optimized further?
Hopefully to improve on the response time of the query and the page.

Requirements:
1. If P_SupplierResponseRate is:
    a. 3, boost by 0.4
    b. 2, boost by 0.2

2. If P_SupplierResponseTime is:
    a. 4, boost by 0.4
    b. 3, boost by 0.2

3. If P_MWSScore is:
    a. between 80-100, boost by 1.6
    b. between 60-79, boost by 0.8

4. If P_SupplierRanking is:
    a. 3, boost by 0.3
    b. 4, boost by 0.6
    c. 5, boost by 0.9
    b. 6, boost by 1.2

Boosting parameters implemented:
bf=map(P_SupplierResponseRate,3,3,0.4,0)
bf=map(P_SupplierResponseRate,2,2,0.2,0)

bf=map(P_SupplierResponseTime,4,4,0.4,0)
bf=map(P_SupplierResponseTime,3,3,0.2,0)

bf=map(P_MWSScore,80,100,1.6,0)
bf=map(P_MWSScore,60,79,0.8,0)

bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))


I am using Solr 7.7.2

----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.


Re: optimize boosting parameters

Posted by Derek Poh <dp...@globalsources.com.INVALID>.
We monitor the response time (pingdom) of the page that uses these 
boosting parameters. Since the addition of these boosting parameters and 
an additional field to search on (which I will create a thread on it in 
the mailing list), the page average response time has increased by 1-2 
seconds.
Management has feedback on this.

> If it does turn out to be the boosting (and IIRC the
> map function can be expensive), can you pre-compute some
> number of the boosts? Your requirements look
> like they can be computed at index time, then boost
> by just the value of the pre-computed field.
I have gone through the list of functions and map function is the only 
one that can meet the requirements.
Or is there a less expensive function that I missed out?

By pre-compute some number, do you mean before the indexing at 
preparation stage, check the value of P_SupplierResponseRate. If the 
value = 3, specify 'boost="0.4"' for the field of the document?

> BTW, boosts < 1.0
> _reduce_ the score. I mention that just in case that’s a surprise ;)
Oh it is to reduce the score?! Not increase (multiply or add) the score 
by less than 1?

>   You use termfreq, which changes of course, but
> 1> if your corpus is updated often enough, the termfreqs will be relatively stable.
>        in that case you can pre-compute them too.
We do incremental indexing every half an hour on this collection. 
Average of 50K-100K documents during each indexing. Collection has 7+ 
milliion documents.
So the entire corpus does not get updated in every indexing.

> 2> your problem statement has nothing to do with termfreq so why are you
>       using it in the first place?
I read up on termfreq function again. It returns the number of times the 
term appears in the field for that document. It does not really fit the 
requirements. Thank you for pointing it out.
I should use map instead?

Derek

On 8/12/2020 9:48 pm, Erick Erickson wrote:
> Before worrying about it too much, exactly _how_ much has
> the performance changed?
>
> I’ve just been in too many situations where there’s
> no objective measure of performance before and after, just
> someone saying “it seems slower” and had those performance
> changes disappear when a rigorous test is done. Then spent
> a lot of time figuring out that the person reporting the
> problem hadn’t had coffee yet. Or the network was slow.
> Or….
>
> If it does turn out to be the boosting (and IIRC the
> map function can be expensive), can you pre-compute some
> number of the boosts? Your requirements look
> like they can be computed at index time, then boost
> by just the value of the pre-computed field. BTW, boosts < 1.0
> _reduce_ the score. I mention that just in case that’s a surprise ;)
> Of course that means that to change the boosting you need
> to re-index.
>
>   You use termfreq, which changes of course, but
> 1> if your corpus is updated often enough, the termfreqs will be relatively stable.
>        in that case you can pre-compute them too.
>
>
> 2> your problem statement has nothing to do with termfreq so why are you
>       using it in the first place?
>
> Best,
> Erick
>
>> On Dec 8, 2020, at 12:46 AM, Radu Gheorghe <ra...@sematext.com> wrote:
>>
>> Hi Derek,
>>
>> Ah, then my reply was completely off :)
>>
>> I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure.
>>
>> Best regards,
>> Radu
>> --
>> Sematext Cloud - Full Stack Observability - https://sematext.com
>> Solr and Elasticsearch Consulting, Training and Production Support
>>
>>> On 8 Dec 2020, at 06:17, Derek Poh <dp...@globalsources.com> wrote:
>>>
>>> Hi Radu
>>>
>>> Apologies for not making myself clear.
>>>
>>> I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements.
>>>
>>> For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters.
>>>
>>> Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way?
>>>
>>> On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
>>>> Hi Derek,
>>>>
>>>> It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself.
>>>>
>>>> I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back?
>>>>
>>>> Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (
>>>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
>>>> ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing.
>>>>
>>>> Or even better, you can have something like Quaerite play with boost values for you:
>>>>
>>>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>>>>
>>>>
>>>> Best regards,
>>>> Radu
>>>> --
>>>> Sematext Cloud - Full Stack Observability -
>>>> https://sematext.com
>>>>
>>>> Solr and Elasticsearch Consulting, Training and Production Support
>>>>
>>>>
>>>>> On 7 Dec 2020, at 10:51, Derek Poh <dp...@globalsources.com.INVALID>
>>>>> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then.
>>>>> I am trying to find out if the added boosting parameters (below) could have contributed to the increased.
>>>>>
>>>>> The boosting is working as per requirements.
>>>>>
>>>>> May I know if the implemented boosting parameters can be enhanced or optimized further?
>>>>> Hopefully to improve on the response time of the query and the page.
>>>>>
>>>>> Requirements:
>>>>> 1. If P_SupplierResponseRate is:
>>>>>    a. 3, boost by 0.4
>>>>>    b. 2, boost by 0.2
>>>>>
>>>>> 2. If P_SupplierResponseTime is:
>>>>>    a. 4, boost by 0.4
>>>>>    b. 3, boost by 0.2
>>>>>
>>>>> 3. If P_MWSScore is:
>>>>>    a. between 80-100, boost by 1.6
>>>>>    b. between 60-79, boost by 0.8
>>>>>
>>>>> 4. If P_SupplierRanking is:
>>>>>    a. 3, boost by 0.3
>>>>>    b. 4, boost by 0.6
>>>>>    c. 5, boost by 0.9
>>>>>    b. 6, boost by 1.2
>>>>>
>>>>> Boosting parameters implemented:
>>>>> bf=map(P_SupplierResponseRate,3,3,0.4,0)
>>>>> bf=map(P_SupplierResponseRate,2,2,0.2,0)
>>>>>
>>>>> bf=map(P_SupplierResponseTime,4,4,0.4,0)
>>>>> bf=map(P_SupplierResponseTime,3,3,0.2,0)
>>>>>
>>>>> bf=map(P_MWSScore,80,100,1.6,0)
>>>>> bf=map(P_MWSScore,60,79,0.8,0)
>>>>>
>>>>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))
>>>>>
>>>>>
>>>>> I am using Solr 7.7.2
>>>>>
>>>>> ----------------------
>>>>> CONFIDENTIALITY NOTICE
>>>>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
>>>>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>> ----------------------
>>> CONFIDENTIALITY NOTICE
>>>
>>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
>>>
>>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>>>
>>>
>


----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.

Re: optimize boosting parameters

Posted by Erick Erickson <er...@gmail.com>.
Before worrying about it too much, exactly _how_ much has
the performance changed?

I’ve just been in too many situations where there’s
no objective measure of performance before and after, just
someone saying “it seems slower” and had those performance
changes disappear when a rigorous test is done. Then spent
a lot of time figuring out that the person reporting the 
problem hadn’t had coffee yet. Or the network was slow.
Or….

If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field. BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)
Of course that means that to change the boosting you need
to re-index.

 You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively stable.
      in that case you can pre-compute them too.


2> your problem statement has nothing to do with termfreq so why are you
     using it in the first place?

Best,
Erick

> On Dec 8, 2020, at 12:46 AM, Radu Gheorghe <ra...@sematext.com> wrote:
> 
> Hi Derek,
> 
> Ah, then my reply was completely off :)
> 
> I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure.
> 
> Best regards,
> Radu
> --
> Sematext Cloud - Full Stack Observability - https://sematext.com
> Solr and Elasticsearch Consulting, Training and Production Support
> 
>> On 8 Dec 2020, at 06:17, Derek Poh <dp...@globalsources.com> wrote:
>> 
>> Hi Radu
>> 
>> Apologies for not making myself clear.
>> 
>> I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements.
>> 
>> For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters.
>> 
>> Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way?
>> 
>> On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
>>> Hi Derek,
>>> 
>>> It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself.
>>> 
>>> I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back?
>>> 
>>> Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (
>>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
>>> ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing.
>>> 
>>> Or even better, you can have something like Quaerite play with boost values for you:
>>> 
>>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>>> 
>>> 
>>> Best regards,
>>> Radu
>>> --
>>> Sematext Cloud - Full Stack Observability - 
>>> https://sematext.com
>>> 
>>> Solr and Elasticsearch Consulting, Training and Production Support
>>> 
>>> 
>>>> On 7 Dec 2020, at 10:51, Derek Poh <dp...@globalsources.com.INVALID>
>>>> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then.
>>>> I am trying to find out if the added boosting parameters (below) could have contributed to the increased.
>>>> 
>>>> The boosting is working as per requirements.
>>>> 
>>>> May I know if the implemented boosting parameters can be enhanced or optimized further?
>>>> Hopefully to improve on the response time of the query and the page.
>>>> 
>>>> Requirements:
>>>> 1. If P_SupplierResponseRate is:
>>>>   a. 3, boost by 0.4
>>>>   b. 2, boost by 0.2
>>>> 
>>>> 2. If P_SupplierResponseTime is:
>>>>   a. 4, boost by 0.4
>>>>   b. 3, boost by 0.2
>>>> 
>>>> 3. If P_MWSScore is:
>>>>   a. between 80-100, boost by 1.6
>>>>   b. between 60-79, boost by 0.8
>>>> 
>>>> 4. If P_SupplierRanking is:
>>>>   a. 3, boost by 0.3
>>>>   b. 4, boost by 0.6
>>>>   c. 5, boost by 0.9
>>>>   b. 6, boost by 1.2
>>>> 
>>>> Boosting parameters implemented:
>>>> bf=map(P_SupplierResponseRate,3,3,0.4,0)
>>>> bf=map(P_SupplierResponseRate,2,2,0.2,0)
>>>> 
>>>> bf=map(P_SupplierResponseTime,4,4,0.4,0)
>>>> bf=map(P_SupplierResponseTime,3,3,0.2,0)
>>>> 
>>>> bf=map(P_MWSScore,80,100,1.6,0)
>>>> bf=map(P_MWSScore,60,79,0.8,0)
>>>> 
>>>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))
>>>> 
>>>> 
>>>> I am using Solr 7.7.2
>>>> 
>>>> ----------------------
>>>> CONFIDENTIALITY NOTICE 
>>>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 
>>>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------- 
>> CONFIDENTIALITY NOTICE 
>> 
>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 
>> 
>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>> 
>> 
> 


Re: optimize boosting parameters

Posted by Radu Gheorghe <ra...@sematext.com>.
Hi Derek,

Ah, then my reply was completely off :)

I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 8 Dec 2020, at 06:17, Derek Poh <dp...@globalsources.com> wrote:
> 
> Hi Radu
> 
> Apologies for not making myself clear.
> 
> I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements.
> 
> For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters.
> 
> Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way?
> 
> On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
>> Hi Derek,
>> 
>> It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself.
>> 
>> I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back?
>> 
>> Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (
>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
>> ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing.
>> 
>> Or even better, you can have something like Quaerite play with boost values for you:
>> 
>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>> 
>> 
>> Best regards,
>> Radu
>> --
>> Sematext Cloud - Full Stack Observability - 
>> https://sematext.com
>> 
>> Solr and Elasticsearch Consulting, Training and Production Support
>> 
>> 
>>> On 7 Dec 2020, at 10:51, Derek Poh <dp...@globalsources.com.INVALID>
>>>  wrote:
>>> 
>>> Hi
>>> 
>>> I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then.
>>> I am trying to find out if the added boosting parameters (below) could have contributed to the increased.
>>> 
>>> The boosting is working as per requirements.
>>> 
>>> May I know if the implemented boosting parameters can be enhanced or optimized further?
>>> Hopefully to improve on the response time of the query and the page.
>>> 
>>> Requirements:
>>> 1. If P_SupplierResponseRate is:
>>>    a. 3, boost by 0.4
>>>    b. 2, boost by 0.2
>>> 
>>> 2. If P_SupplierResponseTime is:
>>>    a. 4, boost by 0.4
>>>    b. 3, boost by 0.2
>>> 
>>> 3. If P_MWSScore is:
>>>    a. between 80-100, boost by 1.6
>>>    b. between 60-79, boost by 0.8
>>> 
>>> 4. If P_SupplierRanking is:
>>>    a. 3, boost by 0.3
>>>    b. 4, boost by 0.6
>>>    c. 5, boost by 0.9
>>>    b. 6, boost by 1.2
>>> 
>>> Boosting parameters implemented:
>>> bf=map(P_SupplierResponseRate,3,3,0.4,0)
>>> bf=map(P_SupplierResponseRate,2,2,0.2,0)
>>> 
>>> bf=map(P_SupplierResponseTime,4,4,0.4,0)
>>> bf=map(P_SupplierResponseTime,3,3,0.2,0)
>>> 
>>> bf=map(P_MWSScore,80,100,1.6,0)
>>> bf=map(P_MWSScore,60,79,0.8,0)
>>> 
>>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))
>>> 
>>> 
>>> I am using Solr 7.7.2
>>> 
>>> ----------------------
>>> CONFIDENTIALITY NOTICE 
>>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 
>>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>>> 
>>> 
>> 
> 
> 
> 
> 
> 
> ---------------------- 
> CONFIDENTIALITY NOTICE 
> 
> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 
> 
> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
> 
> 


Re: optimize boosting parameters

Posted by Derek Poh <dp...@globalsources.com.INVALID>.
Hi Radu

Apologies for not making myself clear.

I would like to know if there is a more simple or efficient way to craft 
the boosting parameters based on the requirements.

For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
parameters.

Is there a more efficient or simple function that can be use instead? Or 
craft the 'formula' it in a more efficient way?

On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
> Hi Derek,
>
> It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself.
>
> I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back?
>
> Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing.
>
> Or even better, you can have something like Quaerite play with boost values for you:
> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>
> Best regards,
> Radu
> --
> Sematext Cloud - Full Stack Observability - https://sematext.com
> Solr and Elasticsearch Consulting, Training and Production Support
>
>> On 7 Dec 2020, at 10:51, Derek Poh <dp...@globalsources.com.INVALID> wrote:
>>
>> Hi
>>
>> I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then.
>> I am trying to find out if the added boosting parameters (below) could have contributed to the increased.
>>
>> The boosting is working as per requirements.
>>
>> May I know if the implemented boosting parameters can be enhanced or optimized further?
>> Hopefully to improve on the response time of the query and the page.
>>
>> Requirements:
>> 1. If P_SupplierResponseRate is:
>>     a. 3, boost by 0.4
>>     b. 2, boost by 0.2
>>
>> 2. If P_SupplierResponseTime is:
>>     a. 4, boost by 0.4
>>     b. 3, boost by 0.2
>>
>> 3. If P_MWSScore is:
>>     a. between 80-100, boost by 1.6
>>     b. between 60-79, boost by 0.8
>>
>> 4. If P_SupplierRanking is:
>>     a. 3, boost by 0.3
>>     b. 4, boost by 0.6
>>     c. 5, boost by 0.9
>>     b. 6, boost by 1.2
>>
>> Boosting parameters implemented:
>> bf=map(P_SupplierResponseRate,3,3,0.4,0)
>> bf=map(P_SupplierResponseRate,2,2,0.2,0)
>>
>> bf=map(P_SupplierResponseTime,4,4,0.4,0)
>> bf=map(P_SupplierResponseTime,3,3,0.2,0)
>>
>> bf=map(P_MWSScore,80,100,1.6,0)
>> bf=map(P_MWSScore,60,79,0.8,0)
>>
>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))
>>
>>
>> I am using Solr 7.7.2
>>
>> ----------------------
>> CONFIDENTIALITY NOTICE
>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>>
>


----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.

Re: optimize boosting parameters

Posted by Radu Gheorghe <ra...@sematext.com>.
Hi Derek,

It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself.

I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back?

Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing.

Or even better, you can have something like Quaerite play with boost values for you:
https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 7 Dec 2020, at 10:51, Derek Poh <dp...@globalsources.com.INVALID> wrote:
> 
> Hi
> 
> I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then.
> I am trying to find out if the added boosting parameters (below) could have contributed to the increased.
> 
> The boosting is working as per requirements.
> 
> May I know if the implemented boosting parameters can be enhanced or optimized further?
> Hopefully to improve on the response time of the query and the page.
> 
> Requirements:
> 1. If P_SupplierResponseRate is:
>    a. 3, boost by 0.4
>    b. 2, boost by 0.2
> 
> 2. If P_SupplierResponseTime is:
>    a. 4, boost by 0.4
>    b. 3, boost by 0.2
> 
> 3. If P_MWSScore is:
>    a. between 80-100, boost by 1.6
>    b. between 60-79, boost by 0.8
> 
> 4. If P_SupplierRanking is:
>    a. 3, boost by 0.3
>    b. 4, boost by 0.6
>    c. 5, boost by 0.9
>    b. 6, boost by 1.2
> 
> Boosting parameters implemented:
> bf=map(P_SupplierResponseRate,3,3,0.4,0)
> bf=map(P_SupplierResponseRate,2,2,0.2,0)
> 
> bf=map(P_SupplierResponseTime,4,4,0.4,0)
> bf=map(P_SupplierResponseTime,3,3,0.2,0)
> 
> bf=map(P_MWSScore,80,100,1.6,0)
> bf=map(P_MWSScore,60,79,0.8,0)
> 
> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))
> 
> 
> I am using Solr 7.7.2
> 
> ----------------------
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
>