You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2020/07/29 01:03:00 UTC

[jira] [Commented] (SOLR-14687) Make child/parent query parsers natively aware of _nest_path_

    [ https://issues.apache.org/jira/browse/SOLR-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166783#comment-17166783 ] 

Chris M. Hostetter commented on SOLR-14687:
-------------------------------------------


Here's my straw man proposal -- i _think_ this covers all the "common" needs/usages of the parent & child QParsers, and would satisfy the needs of the underlying "allParents" constraints...

* add a new {{parentPath}} localparam to both the {{parent}} and {{child}} qparsers
** this new param would be mandatory/mutually exclusive with the existing (mandatory) {{which}} / {{of}} params
* {{parentPath}} values *MUST* start with a "/" and consist of "/" delimited path segments.
** if the {{parentPath}} value consists of at least one segment, but also ends with a "/" it should be automatically striped off during parsing (ie: "/a/b/c/" should be treated exactly the same as "/a/b/c")
** the parsers would use {{parentPath}} to compute a list of all "prefix subpaths" to build an "OR" query (against the {{_nest_path_}} field) for use as the "allParents" filter in the resulting BlockJoin query
*** the base prefix subpath of "/" would be special case converted to {{(*:* -_nest_path_:*)}} _as a clause in all "allParents" queries computed this way_
*** this will ensure that "all parents" includes non-hierarchical documents, and documents at the same level, or at a higher level, then the documents we want to consider.

* In the {{parent}} parser, the full {{parentPath}} would _also_ be used in two additional ways:
** as a field query against the {{_nest_path_}} field combined with the final {{ToParentBlockJoinQuery}} in a BooleanQuery where both clauses are {{MUST}}
*** a {{parentPath}} of "/" would be again be special case converted to {{(*:* -_nest_path_:*)}} when building this BooleanQuery
*** this will ensure that only documents at the requested "level" of the hierachy are returned
** as a prefix query against the {{_nest_path_}} field (with "/" appended) that would be combined with the (wrapped query) {{v}} param in a new BooleanQuery where both clauses are {{MUST}}
*** once again a {{parentPath}} of "/" would be special case converted, but in this case to {{_nest_path_:*}}, when building this BooleanQuery
*** this will ensure that only documents with a {{_nest_path_}} _below_ the specified {{parentPath}} will be matched by the wrapped query -- preventing the existing "Child query must not match same docs with parent filter ..." errors


* In the {{child}} parser, things would be slightly simpler:
** the full {{parentPath}} would be used as a field query against the {{_nest_path_}} field combined with the (wrapped query) {{v}} param in a new BooleanQuery where both clauses are {{MUST}}
*** once again a {{parentPath}} of "/" would be special case converted, in this case to {{_nest_path_:*}}, when building this BooleanQuery
*** this will ensure that the eventual {{ToChildBlockJoinQuery}} will only consider descendents of parents at the actual level specified, even if the wrapped query matches against douments "higher" in the hierarchy -- and preventing the existing "Parent query must not match any docs besides parent filter. ..." errors


Here's some examples of this hypothetical new syntax, showing what each would be syntactic sugar for using hte existing syntax...


{noformat}
NEW:  q={!parent parentPath="/a/b/c"}c_title:son

OLD:  q=(+{!field f="_nest_path_" v="/a/b/c"} +{!parent which="((*:* -_nest_path_:*) OR _nest_path_:(/a /a/b /a/b/c))" v=$vv})
     vv=(+c_title:son +{prefix f="_nest_path_" v="/a/b/c/"})
{noformat}

{noformat}
NEW:  q={!parent parentPath="/"}c_title:son

OLD:  q=(+_nest_path_:* +{!parent which="(*:* -_nest_path_:*)" v=$vv}
     vv=(+c_title:son +_nest_path_:*)
{noformat}


{noformat}
NEW:  q={!child parentPath="/a/b/c"}p_title:dad

OLD:  q={!child of="((*:* -_nest_path_:*) OR _nest_path_:(/a /a/b /a/b/c))" v=$vv})
     vv=(+p_title:dad +{field f="_nest_path_" v="/a/b/c"})
{noformat}

{noformat}
NEW:  q={!child parentPath="/"}p_title:dad

OLD:  q={!child of="(*:* -_nest_path_:*)" v=$vv})
     vv=(+p_title:dad +_nest_path_:*)
{noformat}


What do folks think?


----

(There may also be an oportunity for a {{childPath}} param, which would be required to be a sub-string of the {{parentPath}}, and could replace the internal usage of the {{parentPath}} in some cases when we do't neccessaryily want to consider _all_ children, just children at a specific depth .. but it may be best not to get bogged down in this extension of the idea just yet ... this "simplest" form that we should support would be just a {{parentPath}})



> Make child/parent query parsers natively aware of _nest_path_
> -------------------------------------------------------------
>
>                 Key: SOLR-14687
>                 URL: https://issues.apache.org/jira/browse/SOLR-14687
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> A long standing pain point of the parent/child QParsers is the "all parents" bitmask/filter specified via the "which" and "of" params (respectively).
> This is particularly tricky/painful to "get right" when dealing with multi-level nested documents...
>  * https://issues.apache.org/jira/browse/SOLR-14383?focusedCommentId=17166339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17166339
>  * [https://lists.apache.org/thread.html/r7633a366dd76e7ce9d98e6b9f2a65da8af8240e846f789d938c8113f%40%3Csolr-user.lucene.apache.org%3E]
> ...and it's *really* hard to get right when the nested structure isn't 100% consistent among all docs:
>  * collections that mix docs w/o children and docs that have children.
>  ** Ex: blog posts, some of which have child docs that are "comments", but some don't
>  * when some "types" of documents can exist at multiple levels:
>  ** Ex: top level "product" documents, which may have 2 types of children: "skus" and "manuals", but "skus" may also have their own wku-specific child "manuals"
> BUT! ... now that we have some semi-native support for the {{_nest_path_}} field, i think it may be possible to offer an "easier to use" variant syntax of the parent/child QParsers that directly depends on these fields. This new syntax should be optional – and purely syntactic sugar. "expert" users should be able to do all the same things using the existing syntax (possibly more efficiently depending on what invarients exist in their data model)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org