You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Noah Torp-Smith <no...@dbc.dk.INVALID> on 2022/08/26 09:22:30 UTC

Ranking based on number of OR clauses matched

We have a search engine with books that have topics and other features. We do faceting on these features, and allow users to check off topics they are interested in, in the UI. If a user checks off more than one topic, we retrieve books that have any of the topics checked off, so the search becomes disjunctive.

My question is, is there a way to rank books that match more than one selected topic higher than books that match only one topic?

Topic is only an example here. Ideally, we would like to do this on features that we only have as docValues in solr, so ordering them after they are returned by solr would require changes we would like to avoid if possible.

I realize this might be a "classical" question but I have not been able to formulate a query in google/stackexchange that gave me an answer.

Thanks!

/Noah


--

Noah Torp-Smith (nots@dbc.dk)

Sv: Ranking based on number of OR clauses matched

Posted by Noah Torp-Smith <no...@dbc.dk.INVALID>.
I was simply trying to convey the simplest possible example that shows the issue. The real query is more complicated than that.

Anyway, I think the point about `fq` not influencing the score explains it. Thanks for the help.


--

Noah Torp-Smith (nots@dbc.dk)

________________________________
Fra: Dave <ha...@gmail.com>
Sendt: 26. august 2022 14:14
Til: users@solr.apache.org <us...@solr.apache.org>
Emne: Re: Ranking based on number of OR clauses matched

Why is your qf set to only those two fields and not the subject?  Also in the qf you can boost them. The filter query has no effect on the score, it just eliminates documents that don’t meet your query

> On Aug 26, 2022, at 7:55 AM, Noah Torp-Smith <no...@dbc.dk.invalid> wrote:
>
> OK, I've narrowed it down a bit. I can recreate the behaviour with this (I am sending to the /query endpoint, not /select). We are sending the selected checkboxes as filters (fq in /select lingo, I guess).
>
> ===
> {
>    "query": "dyr",
>    "filter": [
>        "work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
>        "doc_type:work"
>    ],
>    "fields": "work.workid work.title work.creator work.subject_dbc score",
>    "offset": 0,
>    "limit": 100,
>    "params": {
>        "defType": "edismax",
>        "qf": [
>            "work.creator",
>            "work.title",
>        ],
>        "sort": "score desc",
>        "debug": true
>        "indent": true
>    }
> }
> ===
>
> This query has the first book with both those subjects as the 10th result in the list. If I move that clause from "filter" to "q", the ordering of the result has books with both subjects at the top of the list.
>
> ===
>
> {
>    "query": "dyr AND work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
>    "filter": [
>        "doc_type:work"
>    ],
>    "fields": "work.workid work.title work.creator work.subject_dbc score",
>    "offset": 0,
>    "limit": 100,
>    "params": {
>        "defType": "edismax",
>        "qf": [
>            "work.creator",
>            "work.title",
>        ],
>        "sort": "score desc"
>    }
> }
>
> ===
>
> Is that intended behaviour?
>
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>
> ________________________________
> Fra: Noah Torp-Smith <no...@dbc.dk.INVALID>
> Sendt: 26. august 2022 12:25
> Til: users@solr.apache.org <us...@solr.apache.org>
> Emne: Sv: Ranking based on number of OR clauses matched
>
> Hi Alex, thanks for responding so quickly.
>
> I guess (but I'll need to verify) the issue is that we boost on some other things by default. Thigs like how often has stuff been loaned, how many copies are there in the libraries. That seems to (but again, I'll need to make some experiments to verify) take priority over how many OR clauses from one of the fqs that are matched. That makes sense, but I wonder if there is a way to work around that?
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>
> ________________________________
> Fra: Alessandro Benedetti <be...@gmail.com>
> Sendt: 26. august 2022 11:26
> Til: users@solr.apache.org <us...@solr.apache.org>
> Emne: Re: Ranking based on number of OR clauses matched
>
> [Du får ikke ofte mails fra benedetti.alex85@gmail.com. Få mere at vide om, hvorfor dette er vigtigt, på https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Noah,
> That's pretty much the default if you go with a pure boolean query!
> Do you see a different behaviour?
> What is your query?
>
> Cheers
>
>
>> On Fri, 26 Aug 2022, 11:22 Noah Torp-Smith, <no...@dbc.dk.invalid> wrote:
>>
>> We have a search engine with books that have topics and other features. We
>> do faceting on these features, and allow users to check off topics they are
>> interested in, in the UI. If a user checks off more than one topic, we
>> retrieve books that have any of the topics checked off, so the search
>> becomes disjunctive.
>>
>> My question is, is there a way to rank books that match more than one
>> selected topic higher than books that match only one topic?
>>
>> Topic is only an example here. Ideally, we would like to do this on
>> features that we only have as docValues in solr, so ordering them after
>> they are returned by solr would require changes we would like to avoid if
>> possible.
>>
>> I realize this might be a "classical" question but I have not been able to
>> formulate a query in google/stackexchange that gave me an answer.
>>
>> Thanks!
>>
>> /Noah
>>
>>
>> --
>>
>> Noah Torp-Smith (nots@dbc.dk)
>>

Re: Ranking based on number of OR clauses matched

Posted by Dave <ha...@gmail.com>.
Why is your qf set to only those two fields and not the subject?  Also in the qf you can boost them. The filter query has no effect on the score, it just eliminates documents that don’t meet your query 

> On Aug 26, 2022, at 7:55 AM, Noah Torp-Smith <no...@dbc.dk.invalid> wrote:
> 
> OK, I've narrowed it down a bit. I can recreate the behaviour with this (I am sending to the /query endpoint, not /select). We are sending the selected checkboxes as filters (fq in /select lingo, I guess).
> 
> ===
> {
>    "query": "dyr",
>    "filter": [
>        "work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
>        "doc_type:work"
>    ],
>    "fields": "work.workid work.title work.creator work.subject_dbc score",
>    "offset": 0,
>    "limit": 100,
>    "params": {
>        "defType": "edismax",
>        "qf": [
>            "work.creator",
>            "work.title",
>        ],
>        "sort": "score desc",
>        "debug": true
>        "indent": true
>    }
> }
> ===
> 
> This query has the first book with both those subjects as the 10th result in the list. If I move that clause from "filter" to "q", the ordering of the result has books with both subjects at the top of the list.
> 
> ===
> 
> {
>    "query": "dyr AND work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
>    "filter": [
>        "doc_type:work"
>    ],
>    "fields": "work.workid work.title work.creator work.subject_dbc score",
>    "offset": 0,
>    "limit": 100,
>    "params": {
>        "defType": "edismax",
>        "qf": [
>            "work.creator",
>            "work.title",
>        ],
>        "sort": "score desc"
>    }
> }
> 
> ===
> 
> Is that intended behaviour?
> 
> 
> 
> --
> 
> Noah Torp-Smith (nots@dbc.dk)
> 
> ________________________________
> Fra: Noah Torp-Smith <no...@dbc.dk.INVALID>
> Sendt: 26. august 2022 12:25
> Til: users@solr.apache.org <us...@solr.apache.org>
> Emne: Sv: Ranking based on number of OR clauses matched
> 
> Hi Alex, thanks for responding so quickly.
> 
> I guess (but I'll need to verify) the issue is that we boost on some other things by default. Thigs like how often has stuff been loaned, how many copies are there in the libraries. That seems to (but again, I'll need to make some experiments to verify) take priority over how many OR clauses from one of the fqs that are matched. That makes sense, but I wonder if there is a way to work around that?
> 
> 
> --
> 
> Noah Torp-Smith (nots@dbc.dk)
> 
> ________________________________
> Fra: Alessandro Benedetti <be...@gmail.com>
> Sendt: 26. august 2022 11:26
> Til: users@solr.apache.org <us...@solr.apache.org>
> Emne: Re: Ranking based on number of OR clauses matched
> 
> [Du får ikke ofte mails fra benedetti.alex85@gmail.com. Få mere at vide om, hvorfor dette er vigtigt, på https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi Noah,
> That's pretty much the default if you go with a pure boolean query!
> Do you see a different behaviour?
> What is your query?
> 
> Cheers
> 
> 
>> On Fri, 26 Aug 2022, 11:22 Noah Torp-Smith, <no...@dbc.dk.invalid> wrote:
>> 
>> We have a search engine with books that have topics and other features. We
>> do faceting on these features, and allow users to check off topics they are
>> interested in, in the UI. If a user checks off more than one topic, we
>> retrieve books that have any of the topics checked off, so the search
>> becomes disjunctive.
>> 
>> My question is, is there a way to rank books that match more than one
>> selected topic higher than books that match only one topic?
>> 
>> Topic is only an example here. Ideally, we would like to do this on
>> features that we only have as docValues in solr, so ordering them after
>> they are returned by solr would require changes we would like to avoid if
>> possible.
>> 
>> I realize this might be a "classical" question but I have not been able to
>> formulate a query in google/stackexchange that gave me an answer.
>> 
>> Thanks!
>> 
>> /Noah
>> 
>> 
>> --
>> 
>> Noah Torp-Smith (nots@dbc.dk)
>> 

Sv: Ranking based on number of OR clauses matched

Posted by Noah Torp-Smith <no...@dbc.dk.INVALID>.
OK, I've narrowed it down a bit. I can recreate the behaviour with this (I am sending to the /query endpoint, not /select). We are sending the selected checkboxes as filters (fq in /select lingo, I guess).

===
{
    "query": "dyr",
    "filter": [
        "work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
        "doc_type:work"
    ],
    "fields": "work.workid work.title work.creator work.subject_dbc score",
    "offset": 0,
    "limit": 100,
    "params": {
        "defType": "edismax",
        "qf": [
            "work.creator",
            "work.title",
        ],
        "sort": "score desc",
        "debug": true
        "indent": true
    }
}
===

This query has the first book with both those subjects as the 10th result in the list. If I move that clause from "filter" to "q", the ordering of the result has books with both subjects at the top of the list.

===

{
    "query": "dyr AND work.subject_docval:(\"uddøde dyr\" \"forhistoriske dyr\")",
    "filter": [
        "doc_type:work"
    ],
    "fields": "work.workid work.title work.creator work.subject_dbc score",
    "offset": 0,
    "limit": 100,
    "params": {
        "defType": "edismax",
        "qf": [
            "work.creator",
            "work.title",
        ],
        "sort": "score desc"
    }
}

===

Is that intended behaviour?



--

Noah Torp-Smith (nots@dbc.dk)

________________________________
Fra: Noah Torp-Smith <no...@dbc.dk.INVALID>
Sendt: 26. august 2022 12:25
Til: users@solr.apache.org <us...@solr.apache.org>
Emne: Sv: Ranking based on number of OR clauses matched

Hi Alex, thanks for responding so quickly.

I guess (but I'll need to verify) the issue is that we boost on some other things by default. Thigs like how often has stuff been loaned, how many copies are there in the libraries. That seems to (but again, I'll need to make some experiments to verify) take priority over how many OR clauses from one of the fqs that are matched. That makes sense, but I wonder if there is a way to work around that?


--

Noah Torp-Smith (nots@dbc.dk)

________________________________
Fra: Alessandro Benedetti <be...@gmail.com>
Sendt: 26. august 2022 11:26
Til: users@solr.apache.org <us...@solr.apache.org>
Emne: Re: Ranking based on number of OR clauses matched

[Du får ikke ofte mails fra benedetti.alex85@gmail.com. Få mere at vide om, hvorfor dette er vigtigt, på https://aka.ms/LearnAboutSenderIdentification ]

Hi Noah,
That's pretty much the default if you go with a pure boolean query!
Do you see a different behaviour?
What is your query?

Cheers


On Fri, 26 Aug 2022, 11:22 Noah Torp-Smith, <no...@dbc.dk.invalid> wrote:

> We have a search engine with books that have topics and other features. We
> do faceting on these features, and allow users to check off topics they are
> interested in, in the UI. If a user checks off more than one topic, we
> retrieve books that have any of the topics checked off, so the search
> becomes disjunctive.
>
> My question is, is there a way to rank books that match more than one
> selected topic higher than books that match only one topic?
>
> Topic is only an example here. Ideally, we would like to do this on
> features that we only have as docValues in solr, so ordering them after
> they are returned by solr would require changes we would like to avoid if
> possible.
>
> I realize this might be a "classical" question but I have not been able to
> formulate a query in google/stackexchange that gave me an answer.
>
> Thanks!
>
> /Noah
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>

Sv: Ranking based on number of OR clauses matched

Posted by Noah Torp-Smith <no...@dbc.dk.INVALID>.
Hi Alex, thanks for responding so quickly.

I guess (but I'll need to verify) the issue is that we boost on some other things by default. Thigs like how often has stuff been loaned, how many copies are there in the libraries. That seems to (but again, I'll need to make some experiments to verify) take priority over how many OR clauses from one of the fqs that are matched. That makes sense, but I wonder if there is a way to work around that?


--

Noah Torp-Smith (nots@dbc.dk)

________________________________
Fra: Alessandro Benedetti <be...@gmail.com>
Sendt: 26. august 2022 11:26
Til: users@solr.apache.org <us...@solr.apache.org>
Emne: Re: Ranking based on number of OR clauses matched

[Du får ikke ofte mails fra benedetti.alex85@gmail.com. Få mere at vide om, hvorfor dette er vigtigt, på https://aka.ms/LearnAboutSenderIdentification ]

Hi Noah,
That's pretty much the default if you go with a pure boolean query!
Do you see a different behaviour?
What is your query?

Cheers


On Fri, 26 Aug 2022, 11:22 Noah Torp-Smith, <no...@dbc.dk.invalid> wrote:

> We have a search engine with books that have topics and other features. We
> do faceting on these features, and allow users to check off topics they are
> interested in, in the UI. If a user checks off more than one topic, we
> retrieve books that have any of the topics checked off, so the search
> becomes disjunctive.
>
> My question is, is there a way to rank books that match more than one
> selected topic higher than books that match only one topic?
>
> Topic is only an example here. Ideally, we would like to do this on
> features that we only have as docValues in solr, so ordering them after
> they are returned by solr would require changes we would like to avoid if
> possible.
>
> I realize this might be a "classical" question but I have not been able to
> formulate a query in google/stackexchange that gave me an answer.
>
> Thanks!
>
> /Noah
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>

Re: Ranking based on number of OR clauses matched

Posted by Alessandro Benedetti <be...@gmail.com>.
Hi Noah,
That's pretty much the default if you go with a pure boolean query!
Do you see a different behaviour?
What is your query?

Cheers


On Fri, 26 Aug 2022, 11:22 Noah Torp-Smith, <no...@dbc.dk.invalid> wrote:

> We have a search engine with books that have topics and other features. We
> do faceting on these features, and allow users to check off topics they are
> interested in, in the UI. If a user checks off more than one topic, we
> retrieve books that have any of the topics checked off, so the search
> becomes disjunctive.
>
> My question is, is there a way to rank books that match more than one
> selected topic higher than books that match only one topic?
>
> Topic is only an example here. Ideally, we would like to do this on
> features that we only have as docValues in solr, so ordering them after
> they are returned by solr would require changes we would like to avoid if
> possible.
>
> I realize this might be a "classical" question but I have not been able to
> formulate a query in google/stackexchange that gave me an answer.
>
> Thanks!
>
> /Noah
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>