You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2022/06/01 08:33:09 UTC

Re: Solr scale function

Hi Mikhail,

sorry for not being clear, I'll try again.
For my understanding the solr scale function, once applied to a field,
needs min and max for that field.
Those min and max values by default are calculated by all the existing
documents, I don't know exactly how this is implemented internally in Solr.
I assume that, in the worst case scenario, all the documents have to be
traversed reading all the values for the given field and then somehow
saving the min/max.
In the Solr scale function documentation is also written:
> The current implementation cannot distinguish when documents have been
deleted or documents that have no value. It uses 0.0 values for these cases.
This means that often the min value can be 0 if you have only positive
values.

But what happens if I need to scale the values of a field only within the
documents that are the result of a query? Only a few hundreds or thousands
of documents?
First of all min and max has to be calculated only on the result set of
your query.
That is what I was trying to say when I wrote "apply the scale function
only to the result set (and not to the entire collection)".

For example, if you apply the scale function to the field price in Solr
techproducts example, "min" and "max" are between 0.0 and 2199.0

http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price

So even if a filter query is added - fq=popularity:(1 OR 7) - the values
are scaled between 0.0 and 2199.0.

http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)

{
  "responseHeader":{
    "status":0,
    "QTime":30,
    "params":{
      "q":"*:*",
      "fl":"price,scale(price, 0, 1)",
      "fq":"popularity:(1 OR 7)",
      "rows":"100"}},
  "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
      {
        "price":74.99,
        "scale(price, 0, 1)":0.034101862},
      {
        "price":19.95,
        "scale(price, 0, 1)":0.009072306},
      {
        "price":11.5,
        "scale(price, 0, 1)":0.0052296496},
      {
        "price":329.95,
        "scale(price, 0, 1)":0.15004548},
      {
        "price":479.95,
        "scale(price, 0, 1)":0.2182583},
      {
        "price":649.99,
        "scale(price, 0, 1)":0.29558435}]
  }}

As you can see in the results of this query, prices are between 11.5 and
649.99.
What if I want to scale the prices between 11.5 and 649.99?
Or, in other words, what is the easiest way to scale all the values of a
field with the min and max of the current query results?

Right now I'm investigating what's the best way to scale the values of one
or more fields within Solr, but only within the documents that are in the
current result set.

Hope this helps to make things clearer.

Best regards,
Vincenzo




On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Vincenzo,
> Can you elaborate what it means ' apply the scale function only to the
> result set (and not to
> the entire collection).'  ?
>
> On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
> > Hi Mikhail,
> >
> > I'm trying to apply the scale function only to the result set (and not to
> > the entire collection).
> > And I discovered that adding "query($q)" to the scale function does the
> > trick.
> > In other words, adding "query($q)" forces solr to restrict the scale
> > function only to the result set.
> >
> > But if I add an fq to the query parameters the scale function applies
> only
> > to the q param.
> > For example:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":8,
> >     "params":{
> >       "q":"*:*",
> >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> >       "fq":"popularity:(1 OR 7)",
> >       "rows":"100"}},
> >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> >       {
> >         "price":74.99,
> >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
> >       {
> >         "price":19.95,
> >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
> >       {
> >         "price":11.5,
> >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
> >       {
> >         "price":329.95,
> >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
> >       {
> >         "price":479.95,
> >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
> >       {
> >         "price":649.99,
> >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
> >   }}
> >
> > I can avoid this problem by adding a new parameter query($fq) to the
> scale
> > function, but this solution is cumbersome and not maintainable.
> > For example:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($fq)),%200,%201),manu_id_s
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":1,
> >     "params":{
> >       "q":"manu_id_s:(corsair belkin canon viewsonic)",
> >       "fl":"price,scale(sum(sum(price,query($q)),query($fq)), 0,
> > 1),manu_id_s",
> >       "fq":"price:[0 TO 200]",
> >       "rows":"100"}},
> >   "response":{"numFound":5,"start":0,"numFoundExact":true,"docs":[
> >       {
> >         "manu_id_s":"belkin",
> >         "price":19.95,
> >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.048746154},
> >       {
> >         "manu_id_s":"belkin",
> >         "price":11.5,
> >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.0},
> >       {
> >         "manu_id_s":"canon",
> >         "price":179.99,
> >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.97198087},
> >       {
> >         "manu_id_s":"corsair",
> >         "price":185.0,
> >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":1.0},
> >       {
> >         "manu_id_s":"corsair",
> >         "price":74.99,
> >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.3653772}]
> >   }}
> >
> >
> >
> >
> > On Tue, May 31, 2022 at 2:48 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >
> > > Hello Vincenzo,
> > >
> > > I'm not getting your point:
> > >
> > > > if I add an fq parameter the scale function still continues to work
> > only
> > > on
> > > the q param .
> > >
> > > well, but the function actually refers to q param:
> > > scale(sum(price,query($q)), 0, 1).
> > >
> > > What's your expectation values of  query($q) with  "q":"popularity:(1
> OR
> > > 7)"? I suggest to check it with fl=score
> > >
> > >
> > > On Tue, May 31, 2022 at 2:05 PM Vincenzo D'Amore <v....@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > playing with the solr scale function I found a few corner cases
> where I
> > > > need to scale only the results set.
> > > >
> > > > I found a workaround that works but it does not seem to be viable,
> > > because
> > > > if I add an fq parameter the scale function still continues to work
> > only
> > > on
> > > > the q param .
> > > >
> > > > For example with q=popularity:(1 OR 7):
> > > >
> > > > http://localhost:8983/solr/techproducts/select?q=popularity:(1 OR
> > > > 7)&rows=100&fl=price,scale(sum(price,query($q)), 0, 1)
> > > >
> > > > {
> > > >   "responseHeader":{
> > > >     "status":0,
> > > >     "QTime":1,
> > > >     "params":{
> > > >       "q":"popularity:(1 OR 7)",
> > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > > >       "rows":"100"}},
> > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > > >       {
> > > >         "price":74.99,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.099437736},
> > > >       {
> > > >         "price":19.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.013234352},
> > > >       {
> > > >         "price":11.5,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.0},
> > > >       {
> > > >         "price":329.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.49875492},
> > > >       {
> > > >         "price":479.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.7336842},
> > > >       {
> > > >         "price":649.99,
> > > >         "scale(sum(price,query($q)), 0, 1)":1.0}]
> > > >   }}
> > > >
> > > > but moving the filter in fq:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(sum(price,query($q)),%200,%201)
> > > >
> > > > {
> > > >   "responseHeader":{
> > > >     "status":0,
> > > >     "QTime":8,
> > > >     "params":{
> > > >       "q":"*:*",
> > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > > >       "fq":"popularity:(1 OR 7)",
> > > >       "rows":"100"}},
> > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > > >       {
> > > >         "price":74.99,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
> > > >       {
> > > >         "price":19.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
> > > >       {
> > > >         "price":11.5,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
> > > >       {
> > > >         "price":329.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
> > > >       {
> > > >         "price":479.95,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
> > > >       {
> > > >         "price":649.99,
> > > >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
> > > >   }}
> > > >
> > > >
> > > > On the other hand, I was thinking of implementing a custom scale
> > function
> > > > that by default works only on the current result set and not on the
> > > entire
> > > > collection.
> > > >
> > > > Any suggestions on how to solve this problem?
> > > >
> > > > Best regards,
> > > > Vincenzo
> > > >
> > > >
> > > > --
> > > > Vincenzo D'Amore
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> >
> > --
> > Vincenzo D'Amore
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Vincenzo D'Amore

Re: Solr scale function

Posted by Mikhail Khludnev <mk...@apache.org>.
Ok. what is you try something like
q=*:*&fq=popularity:(1 OR
7)&rows=100&fl=price,scale(query($scopeq),0,1)&scopeq={!filters
param=$fq}{!func}price
It passes price field values to scale function, limiting the scope of
min,max calculation by fq.

On Wed, Jun 1, 2022 at 4:11 PM Mikhail Khludnev <mk...@apache.org> wrote:

> From looking at
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ScaleFloatFunction.java#L70
> I conclude that min,max are obtained from all docs in the index.
> But if you specify query() as an argument for scale() it takes only
> matching docs for evaluating min&max. So, what I get so far you a looking
> for a query which matches an intersection of $q AND $fq but yield price
> field value as its score.
> It seems I've got the problem definition. I'll come up with a proposal a
> little bit later.
>
> On Wed, Jun 1, 2022 at 11:33 AM Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
>> Hi Mikhail,
>>
>> sorry for not being clear, I'll try again.
>> For my understanding the solr scale function, once applied to a field,
>> needs min and max for that field.
>> Those min and max values by default are calculated by all the existing
>> documents, I don't know exactly how this is implemented internally in
>> Solr.
>> I assume that, in the worst case scenario, all the documents have to be
>> traversed reading all the values for the given field and then somehow
>> saving the min/max.
>> In the Solr scale function documentation is also written:
>> > The current implementation cannot distinguish when documents have been
>> deleted or documents that have no value. It uses 0.0 values for these
>> cases.
>> This means that often the min value can be 0 if you have only positive
>> values.
>>
>> But what happens if I need to scale the values of a field only within the
>> documents that are the result of a query? Only a few hundreds or thousands
>> of documents?
>> First of all min and max has to be calculated only on the result set of
>> your query.
>> That is what I was trying to say when I wrote "apply the scale function
>> only to the result set (and not to the entire collection)".
>>
>> For example, if you apply the scale function to the field price in Solr
>> techproducts example, "min" and "max" are between 0.0 and 2199.0
>>
>>
>> http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price
>>
>> So even if a filter query is added - fq=popularity:(1 OR 7) - the values
>> are scaled between 0.0 and 2199.0.
>>
>>
>> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)
>>
>> {
>>   "responseHeader":{
>>     "status":0,
>>     "QTime":30,
>>     "params":{
>>       "q":"*:*",
>>       "fl":"price,scale(price, 0, 1)",
>>       "fq":"popularity:(1 OR 7)",
>>       "rows":"100"}},
>>   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>>       {
>>         "price":74.99,
>>         "scale(price, 0, 1)":0.034101862},
>>       {
>>         "price":19.95,
>>         "scale(price, 0, 1)":0.009072306},
>>       {
>>         "price":11.5,
>>         "scale(price, 0, 1)":0.0052296496},
>>       {
>>         "price":329.95,
>>         "scale(price, 0, 1)":0.15004548},
>>       {
>>         "price":479.95,
>>         "scale(price, 0, 1)":0.2182583},
>>       {
>>         "price":649.99,
>>         "scale(price, 0, 1)":0.29558435}]
>>   }}
>>
>> As you can see in the results of this query, prices are between 11.5 and
>> 649.99.
>> What if I want to scale the prices between 11.5 and 649.99?
>> Or, in other words, what is the easiest way to scale all the values of a
>> field with the min and max of the current query results?
>>
>> Right now I'm investigating what's the best way to scale the values of one
>> or more fields within Solr, but only within the documents that are in the
>> current result set.
>>
>> Hope this helps to make things clearer.
>>
>> Best regards,
>> Vincenzo
>>
>>
>>
>>
>> On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev <mk...@apache.org> wrote:
>>
>> > Vincenzo,
>> > Can you elaborate what it means ' apply the scale function only to the
>> > result set (and not to
>> > the entire collection).'  ?
>> >
>> > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore <v....@gmail.com>
>> > wrote:
>> >
>> > > Hi Mikhail,
>> > >
>> > > I'm trying to apply the scale function only to the result set (and
>> not to
>> > > the entire collection).
>> > > And I discovered that adding "query($q)" to the scale function does
>> the
>> > > trick.
>> > > In other words, adding "query($q)" forces solr to restrict the scale
>> > > function only to the result set.
>> > >
>> > > But if I add an fq to the query parameters the scale function applies
>> > only
>> > > to the q param.
>> > > For example:
>> > >
>> > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
>> > >
>> > > {
>> > >   "responseHeader":{
>> > >     "status":0,
>> > >     "QTime":8,
>> > >     "params":{
>> > >       "q":"*:*",
>> > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
>> > >       "fq":"popularity:(1 OR 7)",
>> > >       "rows":"100"}},
>> > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>> > >       {
>> > >         "price":74.99,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
>> > >       {
>> > >         "price":19.95,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
>> > >       {
>> > >         "price":11.5,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
>> > >       {
>> > >         "price":329.95,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
>> > >       {
>> > >         "price":479.95,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
>> > >       {
>> > >         "price":649.99,
>> > >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
>> > >   }}
>> > >
>> > > I can avoid this problem by adding a new parameter query($fq) to the
>> > scale
>> > > function, but this solution is cumbersome and not maintainable.
>> > > For example:
>> > >
>> > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($fq)),%200,%201),manu_id_s
>> > >
>> > > {
>> > >   "responseHeader":{
>> > >     "status":0,
>> > >     "QTime":1,
>> > >     "params":{
>> > >       "q":"manu_id_s:(corsair belkin canon viewsonic)",
>> > >       "fl":"price,scale(sum(sum(price,query($q)),query($fq)), 0,
>> > > 1),manu_id_s",
>> > >       "fq":"price:[0 TO 200]",
>> > >       "rows":"100"}},
>> > >   "response":{"numFound":5,"start":0,"numFoundExact":true,"docs":[
>> > >       {
>> > >         "manu_id_s":"belkin",
>> > >         "price":19.95,
>> > >         "scale(sum(sum(price,query($q)),query($fq)), 0,
>> 1)":0.048746154},
>> > >       {
>> > >         "manu_id_s":"belkin",
>> > >         "price":11.5,
>> > >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.0},
>> > >       {
>> > >         "manu_id_s":"canon",
>> > >         "price":179.99,
>> > >         "scale(sum(sum(price,query($q)),query($fq)), 0,
>> 1)":0.97198087},
>> > >       {
>> > >         "manu_id_s":"corsair",
>> > >         "price":185.0,
>> > >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":1.0},
>> > >       {
>> > >         "manu_id_s":"corsair",
>> > >         "price":74.99,
>> > >         "scale(sum(sum(price,query($q)),query($fq)), 0,
>> 1)":0.3653772}]
>> > >   }}
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, May 31, 2022 at 2:48 PM Mikhail Khludnev <mk...@apache.org>
>> > wrote:
>> > >
>> > > > Hello Vincenzo,
>> > > >
>> > > > I'm not getting your point:
>> > > >
>> > > > > if I add an fq parameter the scale function still continues to
>> work
>> > > only
>> > > > on
>> > > > the q param .
>> > > >
>> > > > well, but the function actually refers to q param:
>> > > > scale(sum(price,query($q)), 0, 1).
>> > > >
>> > > > What's your expectation values of  query($q) with
>> "q":"popularity:(1
>> > OR
>> > > > 7)"? I suggest to check it with fl=score
>> > > >
>> > > >
>> > > > On Tue, May 31, 2022 at 2:05 PM Vincenzo D'Amore <
>> v.damore@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > playing with the solr scale function I found a few corner cases
>> > where I
>> > > > > need to scale only the results set.
>> > > > >
>> > > > > I found a workaround that works but it does not seem to be viable,
>> > > > because
>> > > > > if I add an fq parameter the scale function still continues to
>> work
>> > > only
>> > > > on
>> > > > > the q param .
>> > > > >
>> > > > > For example with q=popularity:(1 OR 7):
>> > > > >
>> > > > > http://localhost:8983/solr/techproducts/select?q=popularity:(1 OR
>> > > > > 7)&rows=100&fl=price,scale(sum(price,query($q)), 0, 1)
>> > > > >
>> > > > > {
>> > > > >   "responseHeader":{
>> > > > >     "status":0,
>> > > > >     "QTime":1,
>> > > > >     "params":{
>> > > > >       "q":"popularity:(1 OR 7)",
>> > > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
>> > > > >       "rows":"100"}},
>> > > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>> > > > >       {
>> > > > >         "price":74.99,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.099437736},
>> > > > >       {
>> > > > >         "price":19.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.013234352},
>> > > > >       {
>> > > > >         "price":11.5,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.0},
>> > > > >       {
>> > > > >         "price":329.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.49875492},
>> > > > >       {
>> > > > >         "price":479.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.7336842},
>> > > > >       {
>> > > > >         "price":649.99,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":1.0}]
>> > > > >   }}
>> > > > >
>> > > > > but moving the filter in fq:
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(sum(price,query($q)),%200,%201)
>> > > > >
>> > > > > {
>> > > > >   "responseHeader":{
>> > > > >     "status":0,
>> > > > >     "QTime":8,
>> > > > >     "params":{
>> > > > >       "q":"*:*",
>> > > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
>> > > > >       "fq":"popularity:(1 OR 7)",
>> > > > >       "rows":"100"}},
>> > > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>> > > > >       {
>> > > > >         "price":74.99,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
>> > > > >       {
>> > > > >         "price":19.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
>> > > > >       {
>> > > > >         "price":11.5,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
>> > > > >       {
>> > > > >         "price":329.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
>> > > > >       {
>> > > > >         "price":479.95,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
>> > > > >       {
>> > > > >         "price":649.99,
>> > > > >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
>> > > > >   }}
>> > > > >
>> > > > >
>> > > > > On the other hand, I was thinking of implementing a custom scale
>> > > function
>> > > > > that by default works only on the current result set and not on
>> the
>> > > > entire
>> > > > > collection.
>> > > > >
>> > > > > Any suggestions on how to solve this problem?
>> > > > >
>> > > > > Best regards,
>> > > > > Vincenzo
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Vincenzo D'Amore
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sincerely yours
>> > > > Mikhail Khludnev
>> > > >
>> > >
>> > >
>> > > --
>> > > Vincenzo D'Amore
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>>
>> --
>> Vincenzo D'Amore
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr scale function

Posted by Mikhail Khludnev <mk...@apache.org>.
From looking at
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ScaleFloatFunction.java#L70
I conclude that min,max are obtained from all docs in the index.
But if you specify query() as an argument for scale() it takes only
matching docs for evaluating min&max. So, what I get so far you a looking
for a query which matches an intersection of $q AND $fq but yield price
field value as its score.
It seems I've got the problem definition. I'll come up with a proposal a
little bit later.

On Wed, Jun 1, 2022 at 11:33 AM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi Mikhail,
>
> sorry for not being clear, I'll try again.
> For my understanding the solr scale function, once applied to a field,
> needs min and max for that field.
> Those min and max values by default are calculated by all the existing
> documents, I don't know exactly how this is implemented internally in Solr.
> I assume that, in the worst case scenario, all the documents have to be
> traversed reading all the values for the given field and then somehow
> saving the min/max.
> In the Solr scale function documentation is also written:
> > The current implementation cannot distinguish when documents have been
> deleted or documents that have no value. It uses 0.0 values for these
> cases.
> This means that often the min value can be 0 if you have only positive
> values.
>
> But what happens if I need to scale the values of a field only within the
> documents that are the result of a query? Only a few hundreds or thousands
> of documents?
> First of all min and max has to be calculated only on the result set of
> your query.
> That is what I was trying to say when I wrote "apply the scale function
> only to the result set (and not to the entire collection)".
>
> For example, if you apply the scale function to the field price in Solr
> techproducts example, "min" and "max" are between 0.0 and 2199.0
>
>
> http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price
>
> So even if a filter query is added - fq=popularity:(1 OR 7) - the values
> are scaled between 0.0 and 2199.0.
>
>
> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":30,
>     "params":{
>       "q":"*:*",
>       "fl":"price,scale(price, 0, 1)",
>       "fq":"popularity:(1 OR 7)",
>       "rows":"100"}},
>   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>       {
>         "price":74.99,
>         "scale(price, 0, 1)":0.034101862},
>       {
>         "price":19.95,
>         "scale(price, 0, 1)":0.009072306},
>       {
>         "price":11.5,
>         "scale(price, 0, 1)":0.0052296496},
>       {
>         "price":329.95,
>         "scale(price, 0, 1)":0.15004548},
>       {
>         "price":479.95,
>         "scale(price, 0, 1)":0.2182583},
>       {
>         "price":649.99,
>         "scale(price, 0, 1)":0.29558435}]
>   }}
>
> As you can see in the results of this query, prices are between 11.5 and
> 649.99.
> What if I want to scale the prices between 11.5 and 649.99?
> Or, in other words, what is the easiest way to scale all the values of a
> field with the min and max of the current query results?
>
> Right now I'm investigating what's the best way to scale the values of one
> or more fields within Solr, but only within the documents that are in the
> current result set.
>
> Hope this helps to make things clearer.
>
> Best regards,
> Vincenzo
>
>
>
>
> On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Vincenzo,
> > Can you elaborate what it means ' apply the scale function only to the
> > result set (and not to
> > the entire collection).'  ?
> >
> > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore <v....@gmail.com>
> > wrote:
> >
> > > Hi Mikhail,
> > >
> > > I'm trying to apply the scale function only to the result set (and not
> to
> > > the entire collection).
> > > And I discovered that adding "query($q)" to the scale function does the
> > > trick.
> > > In other words, adding "query($q)" forces solr to restrict the scale
> > > function only to the result set.
> > >
> > > But if I add an fq to the query parameters the scale function applies
> > only
> > > to the q param.
> > > For example:
> > >
> > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
> > >
> > > {
> > >   "responseHeader":{
> > >     "status":0,
> > >     "QTime":8,
> > >     "params":{
> > >       "q":"*:*",
> > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > >       "fq":"popularity:(1 OR 7)",
> > >       "rows":"100"}},
> > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > >       {
> > >         "price":74.99,
> > >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
> > >       {
> > >         "price":19.95,
> > >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
> > >       {
> > >         "price":11.5,
> > >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
> > >       {
> > >         "price":329.95,
> > >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
> > >       {
> > >         "price":479.95,
> > >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
> > >       {
> > >         "price":649.99,
> > >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
> > >   }}
> > >
> > > I can avoid this problem by adding a new parameter query($fq) to the
> > scale
> > > function, but this solution is cumbersome and not maintainable.
> > > For example:
> > >
> > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($fq)),%200,%201),manu_id_s
> > >
> > > {
> > >   "responseHeader":{
> > >     "status":0,
> > >     "QTime":1,
> > >     "params":{
> > >       "q":"manu_id_s:(corsair belkin canon viewsonic)",
> > >       "fl":"price,scale(sum(sum(price,query($q)),query($fq)), 0,
> > > 1),manu_id_s",
> > >       "fq":"price:[0 TO 200]",
> > >       "rows":"100"}},
> > >   "response":{"numFound":5,"start":0,"numFoundExact":true,"docs":[
> > >       {
> > >         "manu_id_s":"belkin",
> > >         "price":19.95,
> > >         "scale(sum(sum(price,query($q)),query($fq)), 0,
> 1)":0.048746154},
> > >       {
> > >         "manu_id_s":"belkin",
> > >         "price":11.5,
> > >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.0},
> > >       {
> > >         "manu_id_s":"canon",
> > >         "price":179.99,
> > >         "scale(sum(sum(price,query($q)),query($fq)), 0,
> 1)":0.97198087},
> > >       {
> > >         "manu_id_s":"corsair",
> > >         "price":185.0,
> > >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":1.0},
> > >       {
> > >         "manu_id_s":"corsair",
> > >         "price":74.99,
> > >         "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.3653772}]
> > >   }}
> > >
> > >
> > >
> > >
> > > On Tue, May 31, 2022 at 2:48 PM Mikhail Khludnev <mk...@apache.org>
> > wrote:
> > >
> > > > Hello Vincenzo,
> > > >
> > > > I'm not getting your point:
> > > >
> > > > > if I add an fq parameter the scale function still continues to work
> > > only
> > > > on
> > > > the q param .
> > > >
> > > > well, but the function actually refers to q param:
> > > > scale(sum(price,query($q)), 0, 1).
> > > >
> > > > What's your expectation values of  query($q) with  "q":"popularity:(1
> > OR
> > > > 7)"? I suggest to check it with fl=score
> > > >
> > > >
> > > > On Tue, May 31, 2022 at 2:05 PM Vincenzo D'Amore <v.damore@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > playing with the solr scale function I found a few corner cases
> > where I
> > > > > need to scale only the results set.
> > > > >
> > > > > I found a workaround that works but it does not seem to be viable,
> > > > because
> > > > > if I add an fq parameter the scale function still continues to work
> > > only
> > > > on
> > > > > the q param .
> > > > >
> > > > > For example with q=popularity:(1 OR 7):
> > > > >
> > > > > http://localhost:8983/solr/techproducts/select?q=popularity:(1 OR
> > > > > 7)&rows=100&fl=price,scale(sum(price,query($q)), 0, 1)
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > >     "status":0,
> > > > >     "QTime":1,
> > > > >     "params":{
> > > > >       "q":"popularity:(1 OR 7)",
> > > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > > > >       "rows":"100"}},
> > > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > > > >       {
> > > > >         "price":74.99,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.099437736},
> > > > >       {
> > > > >         "price":19.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.013234352},
> > > > >       {
> > > > >         "price":11.5,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.0},
> > > > >       {
> > > > >         "price":329.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.49875492},
> > > > >       {
> > > > >         "price":479.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.7336842},
> > > > >       {
> > > > >         "price":649.99,
> > > > >         "scale(sum(price,query($q)), 0, 1)":1.0}]
> > > > >   }}
> > > > >
> > > > > but moving the filter in fq:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(sum(price,query($q)),%200,%201)
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > >     "status":0,
> > > > >     "QTime":8,
> > > > >     "params":{
> > > > >       "q":"*:*",
> > > > >       "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > > > >       "fq":"popularity:(1 OR 7)",
> > > > >       "rows":"100"}},
> > > > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > > > >       {
> > > > >         "price":74.99,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.034101862},
> > > > >       {
> > > > >         "price":19.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.009072306},
> > > > >       {
> > > > >         "price":11.5,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.0052296496},
> > > > >       {
> > > > >         "price":329.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.15004548},
> > > > >       {
> > > > >         "price":479.95,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.2182583},
> > > > >       {
> > > > >         "price":649.99,
> > > > >         "scale(sum(price,query($q)), 0, 1)":0.29558435}]
> > > > >   }}
> > > > >
> > > > >
> > > > > On the other hand, I was thinking of implementing a custom scale
> > > function
> > > > > that by default works only on the current result set and not on the
> > > > entire
> > > > > collection.
> > > > >
> > > > > Any suggestions on how to solve this problem?
> > > > >
> > > > > Best regards,
> > > > > Vincenzo
> > > > >
> > > > >
> > > > > --
> > > > > Vincenzo D'Amore
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > >
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev