You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Taisuke Miyazaki <mi...@lifull.com> on 2021/04/30 10:53:26 UTC

What is the most effective way to boost according to a distribution?

What is the most efficient way to boost a field with possible values
ranging from 0 to 5000, scoring it according to its distribution?

Hi,

For example, suppose the range of values has the following distribution
25th percentile: 100
50th percentile: 1000
75th percentile: 2000
100th percentile (max): 5000

Then, I want to sort them by score as follows
0 ~ 100: 1 point
100 ~ 1000: 2 points
1000 ~ 2000: 3 points
2000 ~ 5000: 4 points

In this example, I've divided it into 4 parts, but in reality, I want to
divide it into 100 parts and score them on a 100-point scoring scale.

The current idea is to use the bf of eDisMax to force the score, and the bq
to force the score.

Also, although I haven't tried it yet, I think it would be faster to
implement and use something like the staircase function, as it would reduce
the number of function calls and make it easier to cache.


I am trying to find out if it is possible to perform the above calculations
on multiple fields and eventually add them together to achieve different
searches for different individuals.

Thanks.

Translated with www.DeepL.com/Translator (free version)

Re: What is the most effective way to boost according to a distribution?

Posted by Alessandro Benedetti <a....@sease.io>.
Hi Taisuke,
tuning the score is definitely an interesting topic.

A first approach could be in using function queries (such as
https://solr.apache.org/guide/8_8/function-queries.html#scale-function )
and a query parser that assign the score to each doc using the function
query (
https://solr.apache.org/guide/8_8/other-parsers.html#function-query-parser )
or sort by (sort=div(popularity,price) desc, score desc)
Your function query could be just the scale or use any feasible combination
of the others.
Bear in mind that this may be expensive and must be benchmarked to see if
it aligns with your performance requirements.

Alternatives may be the usage of constant scoring queries (
https://solr.apache.org/guide/8_8/the-standard-query-parser.html#constant-score-with)
but it would be interesting if it matches your requirements and if it
offers a performance gain in comparison to function queries
Cheers

--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 30 Apr 2021 at 11:54, Taisuke Miyazaki <mi...@lifull.com>
wrote:

> What is the most efficient way to boost a field with possible values
> ranging from 0 to 5000, scoring it according to its distribution?
>
> Hi,
>
> For example, suppose the range of values has the following distribution
> 25th percentile: 100
> 50th percentile: 1000
> 75th percentile: 2000
> 100th percentile (max): 5000
>
> Then, I want to sort them by score as follows
> 0 ~ 100: 1 point
> 100 ~ 1000: 2 points
> 1000 ~ 2000: 3 points
> 2000 ~ 5000: 4 points
>
> In this example, I've divided it into 4 parts, but in reality, I want to
> divide it into 100 parts and score them on a 100-point scoring scale.
>
> The current idea is to use the bf of eDisMax to force the score, and the bq
> to force the score.
>
> Also, although I haven't tried it yet, I think it would be faster to
> implement and use something like the staircase function, as it would reduce
> the number of function calls and make it easier to cache.
>
>
> I am trying to find out if it is possible to perform the above calculations
> on multiple fields and eventually add them together to achieve different
> searches for different individuals.
>
> Thanks.
>
> Translated with www.DeepL.com/Translator (free version)
>