You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2020/08/12 00:33:00 UTC

[jira] [Comment Edited] (SOLR-14687) Make child/parent query parsers natively aware of _nest_path_

    [ https://issues.apache.org/jira/browse/SOLR-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175086#comment-17175086 ] 

Chris M. Hostetter edited comment on SOLR-14687 at 8/12/20, 12:32 AM:
----------------------------------------------------------------------

besides that fact that Jira's WYSIWYG editor lied to me and munged up some of the formatting of "STAR:STAR" and "UNDERSCORE nest UNDERSCORE path UNDERSCORE" in many places, something else has been nagging that i felt like i was overlooking and i finally figured out what it is: I hadn't really accounted for docs that _have_ a "nest path" but their path doesn't have any common ancestors with the {{parentPath}} specified – ie: how would a mix of {{/a/b/c}} hierarchy docs mixed in an index with docs having a hierarchy of {{/x/y/z}} wind up affecting each other?

I *think* that what i described above would still mostly work for the "parent" parser – even if the "parent filter" generated by a {{parentPath="/a/b/c"}} as i described above didn't really "rule out" the other docs, because this still wouldn't match the "nest path with a prefix of /a/b/c" rule for the "children", but it still wouldn't really be a "correct" "parents bit set filter" as the underlying code expects it to be in terms of identifying all "non children" documents ... but** I'm _pretty sure_ it would be broken for the "child" parser case, because some doc with a n "/x" or  "/x/y" path isn't going to be matched by the "parents filter bitset" so might get swallowed up in the list of children.

The other thing that bugged me was the (mistaken & missguided) need to ' ... compute a list of all "prefix subpaths" ... ' – i'm not sure way i thought that was necessary, instead of just saying "must _NOT_ have a prefix of the specified path – ie:
{code:java}
     GIVEN:    {!foo parentPath="/a/b/c"} ...

INSTEAD OF:    PARENT FILTER BITSET = ((*:* -_nest_path_:*) OR _nest_path_:(/a /a/b /a/b/c))

  JUST USE:    PARENT FILTER BITSET = (*:* -{prefix f="_nest_path_" v="/a/b/c/"}) {code}
...which (IIUC) should solve both problems, by matching:
 * docs w/o any nest path
 * docs with a nest path that does NOT start with /a/b/c/
 ** which includes the immediate "/a/b/c" parents, as well as their ancestors, as well as any docs with completely orthoginal paths (like /x/y/z)

But of course: in the case of {{parentFilter="/"}} this would still simply be "docs w/o a nest path"

That should work, right?
----
I also think i made some mistakes/types in my examples above in trying to articular what the equivalent "old style" query would be, so let me restate all of the examples in full...
{noformat}
NEW:  q={!parent parentPath="/a/b/c"}c_title:son

OLD:  q=(+{!field f="_nest_path_" v="/a/b/c"} +{!parent which=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+c_title:son +{prefix f="_nest_path_" v="/a/b/c/"})
{noformat}
{noformat}
NEW:  q={!parent parentPath="/"}c_title:son

OLD:  q=(-_nest_path_:* +{!parent which=$ff v=$vv}
     ff=(*:* -_nest_path_:*) 
     vv=(+c_title:son +_nest_path_:*)
{noformat}
{noformat}
NEW:  q={!child parentPath="/a/b/c"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+p_title:dad +{field f="_nest_path_" v="/a/b/c"})
{noformat}
{noformat}
NEW:  q={!child parentPath="/"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -_nest_path_:*) 
     vv=(+p_title:dad -_nest_path_:*)
{noformat}
 

[~mkhl] - what do you think about this approach? do you see any flaws in the logic here? ... if the logic looks correct, I'd like to write it up as "how to create a *safe* of/which local param when using nest path" doc tip for SOLR-14383 and move forward there as a documentation improvement, even if there are still feature/implementation/syntax concerns/discussion to happen here as far as a "new feature"

 *EDIT*: fixed brain fart / typo of + vs - in last example


was (Author: hossman):
besides that fact that Jira's WYSIWYG editor lied to me and munged up some of the formatting of "STAR:STAR" and "UNDERSCORE nest UNDERSCORE path UNDERSCORE" in many places, something else has been nagging that i felt like i was overlooking and i finally figured out what it is: I hadn't really accounted for docs that _have_ a "nest path" but their path doesn't have any common ancestors with the {{parentPath}} specified – ie: how would a mix of {{/a/b/c}} hierarchy docs mixed in an index with docs having a hierarchy of {{/x/y/z}} wind up affecting each other?

I *think* that what i described above would still mostly work for the "parent" parser – even if the "parent filter" generated by a {{parentPath="/a/b/c"}} as i described above didn't really "rule out" the other docs, because this still wouldn't match the "nest path with a prefix of /a/b/c" rule for the "children", but it still wouldn't really be a "correct" "parents bit set filter" as the underlying code expects it to be in terms of identifying all "non children" documents ... but** I'm _pretty sure_ it would be broken for the "child" parser case, because some doc with a n "/x" or  "/x/y" path isn't going to be matched by the "parents filter bitset" so might get swallowed up in the list of children.

The other thing that bugged me was the (mistaken & missguided) need to ' ... compute a list of all "prefix subpaths" ... ' – i'm not sure way i thought that was necessary, instead of just saying "must _NOT_ have a prefix of the specified path – ie:
{code:java}
     GIVEN:    {!foo parentPath="/a/b/c"} ...

INSTEAD OF:    PARENT FILTER BITSET = ((*:* -_nest_path_:*) OR _nest_path_:(/a /a/b /a/b/c))

  JUST USE:    PARENT FILTER BITSET = (*:* -{prefix f="_nest_path_" v="/a/b/c/"}) {code}
...which (IIUC) should solve both problems, by matching:
 * docs w/o any nest path
 * docs with a nest path that does NOT start with /a/b/c/
 ** which includes the immediate "/a/b/c" parents, as well as their ancestors, as well as any docs with completely orthoginal paths (like /x/y/z)

But of course: in the case of {{parentFilter="/"}} this would still simply be "docs w/o a nest path"

That should work, right?
----
I also think i made some mistakes/types in my examples above in trying to articular what the equivalent "old style" query would be, so let me restate all of the examples in full...
{noformat}
NEW:  q={!parent parentPath="/a/b/c"}c_title:son

OLD:  q=(+{!field f="_nest_path_" v="/a/b/c"} +{!parent which=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+c_title:son +{prefix f="_nest_path_" v="/a/b/c/"})
{noformat}
{noformat}
NEW:  q={!parent parentPath="/"}c_title:son

OLD:  q=(-_nest_path_:* +{!parent which=$ff v=$vv}
     ff=(*:* -_nest_path_:*) 
     vv=(+c_title:son +_nest_path_:*)
{noformat}
{noformat}
NEW:  q={!child parentPath="/a/b/c"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+p_title:dad +{field f="_nest_path_" v="/a/b/c"})
{noformat}
{noformat}
NEW:  q={!child parentPath="/"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -_nest_path_:*) 
     vv=(+p_title:dad +_nest_path_:*)
{noformat}
 

[~mkhl] - what do you think about this approach? do you see any flaws in the logic here? ... if the logic looks correct, I'd like to write it up as "how to create a *safe* of/which local param when using nest path" doc tip for SOLR-14383 and move forward there as a documentation improvement, even if there are still feature/implementation/syntax concerns/discussion to happen here as far as a "new feature"

 

> Make child/parent query parsers natively aware of _nest_path_
> -------------------------------------------------------------
>
>                 Key: SOLR-14687
>                 URL: https://issues.apache.org/jira/browse/SOLR-14687
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> A long standing pain point of the parent/child QParsers is the "all parents" bitmask/filter specified via the "which" and "of" params (respectively).
> This is particularly tricky/painful to "get right" when dealing with multi-level nested documents...
>  * https://issues.apache.org/jira/browse/SOLR-14383?focusedCommentId=17166339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17166339
>  * [https://lists.apache.org/thread.html/r7633a366dd76e7ce9d98e6b9f2a65da8af8240e846f789d938c8113f%40%3Csolr-user.lucene.apache.org%3E]
> ...and it's *really* hard to get right when the nested structure isn't 100% consistent among all docs:
>  * collections that mix docs w/o children and docs that have children.
>  ** Ex: blog posts, some of which have child docs that are "comments", but some don't
>  * when some "types" of documents can exist at multiple levels:
>  ** Ex: top level "product" documents, which may have 2 types of children: "skus" and "manuals", but "skus" may also have their own wku-specific child "manuals"
> BUT! ... now that we have some semi-native support for the {{_nest_path_}} field, i think it may be possible to offer an "easier to use" variant syntax of the parent/child QParsers that directly depends on these fields. This new syntax should be optional – and purely syntactic sugar. "expert" users should be able to do all the same things using the existing syntax (possibly more efficiently depending on what invarients exist in their data model)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org