You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alexey Kozhemiakin (JIRA)" <ji...@apache.org> on 2015/02/26 16:05:08 UTC
[jira] [Commented] (SOLR-7167) ANY operator synax - score only top matching term

    [ https://issues.apache.org/jira/browse/SOLR-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338498#comment-14338498 ] 

Alexey Kozhemiakin commented on SOLR-7167:
------------------------------------------


So we called it edismaxplus:
 
1. This is a initial version which implements ANY operator logic and does not brake other query parsers.
2. This is an implementation of the approach described in previous email. We have choosen approach number 3:
 
Let’s still parse queries from left to right, but remove BooleanQueries when we have ANY-operator and introduce DisjunctionMaxQueries in it’s place.
Query is parsed from left to right.
•         NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
•         AND will change the Occurs flag of the clause to it’s left to MUST unless it has already been set to MUST_NOT
•         AND sets the Occurs flag of the clause to it’s right to MUST
•         If the default operator of the query parser has been set to “And”: OR will change the Occurs flag of the clause to it’s left to SHOULD unless it has already been set to MUST_NOT
•         OR sets the Occurs flag of the clause to it’s right to SHOULD
•         ANY will not change the Occurs flag of the clause to it’s left, but it needs to remove the Boolean query and create a DisjunctionMaxQuery in it’s place.
In the previous approach, things got quickly to complicated. The current grammar does not in fact represent a Boolean logic. There is no Boolean logic grammar tree. It is read more like a stream of tokens, left to right, and when you have AND – you change the Occurs flag of the clause to it’s left to MUST unless it has already been set to MUST_NOT. And sets the Occurs flag of the clause to it’s right to MUST.
You can read more about it here https://lucidworks.com/blog/why-not-and-or-and-not/
To make it all work, we would need to define that grammar in such a way, that OR takes two operands, AND takes two operands, that there is a real tree structure in it. Then we can introduce another operator – ANY :
<AnyOp> ::> 
<Clause (<field>)> <ANY> <Clause(<field>)> (<ANY> <Clause(<field>)>)*
This might introduce quite a few surprises though. We would need to make sure, that even though, parsing is different, the end result stays the same for operators AND, OR. This can also take simply too much time to implement correctly.
The current patch seems to solve all addressed issues like different values of mm parameter, many query fields in edismax's qf, incorrect query syntax. We also don't have to deal with different cord factors, as we will be extending edismax query parser.
 
With this Jar we have addressed all the issues mentioned above. No other parsers are broken etc. The behaviour of ANY operator is consistent with the behaviour of AND and OR operators in existing parser; it is parsed from left to right and has similar possessive behavior as AND operator - the left value is captured and packed into DisjunctionMaxQuery like on the following example:
 
{!edismaxplus}disk ANY cd ANY dvd
 
becomes
 
(+(DisjunctionMaxQuery((((text:disk) | (text:cd)) | (text:dvd)))))/no_coord
 
 
Note that when there are multiple ANY operators in chain the lvalue with DisjunctionMaxQuery will be treated as subquery (such behavior is desired for compatibility with various subqueries that can occur as R or L value and the scoring will work as designed because
 
max(max(a,b),c) = max(a,b,c)
 
3. To make future maintenance easier (eg. solr version upgrade) the parser plugin would require some additional work. For now it is directly based on existing edismax parser codebase with minimal modification to make it work with our code - the result is that we have many functionalities extracted from mainline and injected into plugin (the base edismax implementation, the whole query parser, etc.). To improve it we need to create extension points (as there are none) in existing edismax parser and pass them as a patch to community, the whole implementation of ANY operator should be based solely on such extension points with only stub parser plugin on top to distinguish between base edismax and edismaxplus.


> ANY operator synax - score only top matching term
> -------------------------------------------------
>
>                 Key: SOLR-7167
>                 URL: https://issues.apache.org/jira/browse/SOLR-7167
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>    Affects Versions: 5.0
>            Reporter: Alexey Kozhemiakin
>
> When we query
> (<term A> OR <term B> OR <term C> OR <term D>)
> and in case a document contains 2 or more of these terms: only the highest scoring term should contribute to the final relevancy score; possibly lower scoring  terms should be discarded from the scoring algorithm.
> Ideally I'd like an operator like ANY:
> (<term A> ANY <term B> ANY <term C> ANY <term D>)
> that has the purpose: return documents, sorted by the score of the highest scoring term.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org