You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Robert K." <wk...@mail.ru.INVALID> on 2018/06/07 09:53:35 UTC

Sort hits in the order of subqueries

Hello,

I am investigating the following use case.

Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a boolean query using 'SHOULD'-clauses.
The requirement for the hits sorting is that the results of q_0 precede the results of q_1, the results of q_1 precede the
results of q_2 an so on. If a hit occurs in the results of more then one query, then we should see it only once in the results
of the query with the smallest index.

I have searched for some solutions but didn't find anything useful so far.

I have considered following approaches:

1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...

While possible, seems to have a potential negative impact on performance due to multiple evaluations on the same queries.
I didn't do any measurements, though. It is technically possible to optimize the execution of this query to evaluate the subqueries
q_i only once, but I don't know, whether this kind of optimizations is implemented in the current Lucene/Solr. (?)

2. Implement CustomScoreQuery. General idea: Take a list of queries and execute them in the context of a BooleanQuery mapping
the scores of the corresponding subqueries to disjunct score ranges, like q_n -> [0,1), q_(n-1) -> [1,2) and so on.

Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded approach. Still I didn't see any obvious solution
how I can use FunctionQuery to implement the idea. Is it possible, should I dive in and try to do it with FunctionQuery.

3. Assuming there is some possibility to solve the task with the FunctionQuery (or anything within the out-of-the-box Solr). My questions
are: Is there any solution without having to write our own extension to Solr? Using only what is delivered in the standard distribution of Solr?


Note: In the past we solved the problem within our legacy application with a modified BooleanQuery/BooleanScorer. We could migrate
(=rewrite) this extension to the current Solr/Lucene, but it may be not the best option, so I am exploring all the other possibilities.

Thank you all & Best regards,

Robert

Re[2]: Sort hits in the order of subqueries

Posted by "Robert K." <wk...@mail.ru.INVALID>.
Hello,

I had a look at the Constant Score approach suggested by Emir: (q0^=100) OR (q1)^=90 ...

As observed by Alexandre it seems to introduce stratification at the cost of the intra-query ranking
which is not satisfactory.

So if I imagine Constant Score as a function f(x) = C operating on a document score and constrained
to a subquery then what I would like to have is sigmoid function F(x, C) = C + 1 / (1+ exp(-x)) applied to
the document scores of intra-queries.

Instead of:

ConstantScore(q0, 100) OR ConstantScore(q1, 90) ...

then:

SigmoidScore(q0, 100) OR SigmoidScore(q1, 90) ...

I'm pretty sure, it is possible to take ConstantScore class and end up with Sigmoid as a custom extension.
Still hoping for a hint what is the simplest approach to achieve the stratification.


Next question which I have in this context: we happen to sort some intra queries by different fields in some cases.
It looks like:

(q0 sorted by date) OR (q1 sorted by relevancy)


Wondering if you have any idea how is that possible to formulate in Solr.

Regards,

Robert


>Четверг,  7 июня 2018, 15:20 +02:00 от Alexandre Rafalovitch <ar...@gmail.com>:
>
>I think this solution will destroy intra-query ranking. So all results in
>q0 come before q1 but would be random within q0 results.
>
>Would instead just a bunch of boost queries with different weights
>(additive probably) be a beter way to introduce stratification?
>
>Regards,
>   Alex
>
>On Thu, Jun 7, 2018, 13:19 Emir Arnautović, < emir.arnautovic@sematext.com >
>wrote:
>
>> Hi Robert,
>> If I get your requirement right, you can solve it with following:
>> (q0)^=100 OR (q1)^=90….
>>
>> Assuming there are no overlaps - otherwise, one matching multiple
>> conditions can change the ordering.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training -  http://sematext.com/
>>
>>
>>
>> > On 7 Jun 2018, at 11:53, Robert K. < wk.rk.skype@mail.ru.INVALID > wrote:
>> >
>> > Hello,
>> >
>> > I am investigating the following use case.
>> >
>> > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a
>> boolean query using 'SHOULD'-clauses.
>> > The requirement for the hits sorting is that the results of q_0 precede
>> the results of q_1, the results of q_1 precede the
>> > results of q_2 an so on. If a hit occurs in the results of more then one
>> query, then we should see it only once in the results
>> > of the query with the smallest index.
>> >
>> > I have searched for some solutions but didn't find anything useful so
>> far.
>> >
>> > I have considered following approaches:
>> >
>> > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
>> >
>> > While possible, seems to have a potential negative impact on performance
>> due to multiple evaluations on the same queries.
>> > I didn't do any measurements, though. It is technically possible to
>> optimize the execution of this query to evaluate the subqueries
>> > q_i only once, but I don't know, whether this kind of optimizations is
>> implemented in the current Lucene/Solr. (?)
>> >
>> > 2. Implement CustomScoreQuery. General idea: Take a list of queries and
>> execute them in the context of a BooleanQuery mapping
>> > the scores of the corresponding subqueries to disjunct score ranges,
>> like q_n -> [0,1), q_(n-1) -> [1,2) and so on.
>> >
>> > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded
>> approach. Still I didn't see any obvious solution
>> > how I can use FunctionQuery to implement the idea. Is it possible,
>> should I dive in and try to do it with FunctionQuery.
>> >
>> > 3. Assuming there is some possibility to solve the task with the
>> FunctionQuery (or anything within the out-of-the-box Solr). My questions
>> > are: Is there any solution without having to write our own extension to
>> Solr? Using only what is delivered in the standard distribution of Solr?
>> >
>> >
>> > Note: In the past we solved the problem within our legacy application
>> with a modified BooleanQuery/BooleanScorer. We could migrate
>> > (=rewrite) this extension to the current Solr/Lucene, but it may be not
>> the best option, so I am exploring all the other possibilities.
>> >
>> > Thank you all & Best regards,
>> >
>> > Robert
>>
>>




Re: Sort hits in the order of subqueries

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I think this solution will destroy intra-query ranking. So all results in
q0 come before q1 but would be random within q0 results.

Would instead just a bunch of boost queries with different weights
(additive probably) be a beter way to introduce stratification?

Regards,
   Alex

On Thu, Jun 7, 2018, 13:19 Emir Arnautović, <em...@sematext.com>
wrote:

> Hi Robert,
> If I get your requirement right, you can solve it with following:
> (q0)^=100 OR (q1)^=90….
>
> Assuming there are no overlaps - otherwise, one matching multiple
> conditions can change the ordering.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 Jun 2018, at 11:53, Robert K. <wk...@mail.ru.INVALID> wrote:
> >
> > Hello,
> >
> > I am investigating the following use case.
> >
> > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a
> boolean query using 'SHOULD'-clauses.
> > The requirement for the hits sorting is that the results of q_0 precede
> the results of q_1, the results of q_1 precede the
> > results of q_2 an so on. If a hit occurs in the results of more then one
> query, then we should see it only once in the results
> > of the query with the smallest index.
> >
> > I have searched for some solutions but didn't find anything useful so
> far.
> >
> > I have considered following approaches:
> >
> > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
> >
> > While possible, seems to have a potential negative impact on performance
> due to multiple evaluations on the same queries.
> > I didn't do any measurements, though. It is technically possible to
> optimize the execution of this query to evaluate the subqueries
> > q_i only once, but I don't know, whether this kind of optimizations is
> implemented in the current Lucene/Solr. (?)
> >
> > 2. Implement CustomScoreQuery. General idea: Take a list of queries and
> execute them in the context of a BooleanQuery mapping
> > the scores of the corresponding subqueries to disjunct score ranges,
> like q_n -> [0,1), q_(n-1) -> [1,2) and so on.
> >
> > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded
> approach. Still I didn't see any obvious solution
> > how I can use FunctionQuery to implement the idea. Is it possible,
> should I dive in and try to do it with FunctionQuery.
> >
> > 3. Assuming there is some possibility to solve the task with the
> FunctionQuery (or anything within the out-of-the-box Solr). My questions
> > are: Is there any solution without having to write our own extension to
> Solr? Using only what is delivered in the standard distribution of Solr?
> >
> >
> > Note: In the past we solved the problem within our legacy application
> with a modified BooleanQuery/BooleanScorer. We could migrate
> > (=rewrite) this extension to the current Solr/Lucene, but it may be not
> the best option, so I am exploring all the other possibilities.
> >
> > Thank you all & Best regards,
> >
> > Robert
>
>

Re: Sort hits in the order of subqueries

Posted by Emir Arnautović <em...@sematext.com>.
Hi Robert,
If I get your requirement right, you can solve it with following:
(q0)^=100 OR (q1)^=90….

Assuming there are no overlaps - otherwise, one matching multiple conditions can change the ordering.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Jun 2018, at 11:53, Robert K. <wk...@mail.ru.INVALID> wrote:
> 
> Hello,
> 
> I am investigating the following use case.
> 
> Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a boolean query using 'SHOULD'-clauses.
> The requirement for the hits sorting is that the results of q_0 precede the results of q_1, the results of q_1 precede the
> results of q_2 an so on. If a hit occurs in the results of more then one query, then we should see it only once in the results
> of the query with the smallest index.
> 
> I have searched for some solutions but didn't find anything useful so far.
> 
> I have considered following approaches:
> 
> 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
> 
> While possible, seems to have a potential negative impact on performance due to multiple evaluations on the same queries.
> I didn't do any measurements, though. It is technically possible to optimize the execution of this query to evaluate the subqueries
> q_i only once, but I don't know, whether this kind of optimizations is implemented in the current Lucene/Solr. (?)
> 
> 2. Implement CustomScoreQuery. General idea: Take a list of queries and execute them in the context of a BooleanQuery mapping
> the scores of the corresponding subqueries to disjunct score ranges, like q_n -> [0,1), q_(n-1) -> [1,2) and so on.
> 
> Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded approach. Still I didn't see any obvious solution
> how I can use FunctionQuery to implement the idea. Is it possible, should I dive in and try to do it with FunctionQuery.
> 
> 3. Assuming there is some possibility to solve the task with the FunctionQuery (or anything within the out-of-the-box Solr). My questions
> are: Is there any solution without having to write our own extension to Solr? Using only what is delivered in the standard distribution of Solr?
> 
> 
> Note: In the past we solved the problem within our legacy application with a modified BooleanQuery/BooleanScorer. We could migrate
> (=rewrite) this extension to the current Solr/Lucene, but it may be not the best option, so I am exploring all the other possibilities.
> 
> Thank you all & Best regards,
> 
> Robert