You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2018/11/13 13:45:00 UTC

[jira] [Comment Edited] (OAK-7606) Doing Faceting only on the resultset of one constraints when query contain multiple constraint with OR condition

    [ https://issues.apache.org/jira/browse/OAK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685157#comment-16685157 ] 

Vikas Saurabh edited comment on OAK-7606 at 11/13/18 1:44 PM:
--------------------------------------------------------------

Had an discussion with [~tmueller] off list and here is a synopsis of the conclusions
* we should break the general issue of use-case of {{OR}} into cases - conditions which get simplified into {{IN}} clause v/s conditions that get optimized using {{UNION}}
* the ones that that get simplified to {{IN}} are already working correctly (since the clause gets sent to index and same lucene query answers the whole query)
* {{UNION}} either due to optimized {{OR}} or because it was written by the user is hard to get correctly in a performant way (basically tricks like {{ n(A U B) = n(A) + n(B) - n(A ^ C) }} scale up as 2^N where N is number of sub-queries)
* a trivial approach for {{UNION}} would be simply merge facets across sub-queries with addition of facet counts for same labels - this is obviously incorrect but only in cases where there's significant intersection between sub-queries (so, common path restriction or nodetype restrictions resulting in {{UNION}} would be correct). That said, this needs to be clearly documented to say so.
* A complicated (but correct) approach for fixing facets for {{OR}} would be to disable optimizing to {{UNION}} AND change oak-query to pass on {{OR}} clauses down to lucene (of course, the assumption that it's being by index is implicit)

With all that said, I'd create sub-task/issuees for:
# Ensure for {{OR}} simplified to {{IN}} doesn't generate alternative query with {{UNION}} (it does currently for SQL2 although alternate loses the cost war) - OAK-7897
# Implement simple merge of facets across {{UNION}} - OAK-7898
# Implement capability to pass on {{OR}} clauses to index - OAK-7899

1. and 2. would take care of majority of the cases. 3. would take significant effort and comes with huge implicit risks as it'd be an entirely new code flow and might break the earlier assumptions that {{OR}} never reaches indexes (e.g. cost estimate, planning, etc)


was (Author: catholicon):
Had an discussion with [~tmueller] off list and here is a synopsis of the conclusions
* we should break the general issue of use-case of {{OR}} into cases - conditions which get simplified into {{IN}} clause v/s conditions that get optimized using {{UNION}}
* the ones that that get simplified to {{IN}} are already working correctly (since the clause gets sent to index and same lucene query answers the whole query)
* {{UNION}} either due to optimized {{OR}} or because it was written by the user is hard to get correctly in a performant way (basically tricks like {{ n(A U B) = n(A) + n(B) - n(A ^ C) }} scale up as 2^N where N is number of sub-queries)
* a trivial approach for {{UNION}} would be simply merge facets across sub-queries with addition of facet counts for same labels - this is obviously incorrect but only in cases where there's significant intersection between sub-queries (so, common path restriction or nodetype restrictions resulting in {{UNION}} would be correct). That said, this needs to be clearly documented to say so.
* A complicated (but correct) approach for fixing facets for {{OR}} would be to disable optimizing to {{UNION}} AND change oak-query to pass on {{OR}} clauses down to lucene (of course, the assumption that it's being by index is implicit)

With all that said, I'd create sub-task to this issue for:
# Ensure for {{OR}} simplified to {{IN}} doesn't generate alternative query with {{UNION}} (it does currently for SQL2 although alternate loses the cost war)
# Implement simple merge of facets across {{UNION}}
# Implement capability to pass on {{OR}} clauses to index

1. and 2. would take care of majority of the cases. 3. would take significant effort and comes with huge implicit risks as it'd be an entirely new code flow and might break the earlier assumptions that {{OR}} never reaches indexes (e.g. cost estimate, planning, etc)

> Doing Faceting only on the resultset of one constraints when query contain multiple constraint with OR condition 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-7606
>                 URL: https://issues.apache.org/jira/browse/OAK-7606
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene, query
>            Reporter: Ayush Garg
>            Assignee: Vikas Saurabh
>            Priority: Major
>         Attachments: FacetOnORTest.java
>
>
> Xpath query is  *"//*[(@test = 't1' or @name = 'Node2' )]/(rep:facet(text))/(rep:facet(name)) order by jcr:path*"
> For understanding this error please run the test method "testFacetOnOR()"  in attached java file.
> In this method 3 nodes are created and  properties are set to nodes.
> FacetResult should be on the final result set. But FacetResult set is based on result set by first constraint "@test='t1' " .
> For more understanding run the test method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)