You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ho...@apache.org on 2020/08/22 00:32:49 UTC

[lucene-solr] 02/02: make BJQP info in other-parsers a little more accurate while keeping it brief, including new section on 'Block Mask' concept

This is an automated email from the ASF dual-hosted git repository.

hossman pushed a commit to branch jira/SOLR-14383
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git

commit b534336e4427d13d0b07aeee51ab1462c62e96db
Author: Chris Hostetter <ho...@apache.org>
AuthorDate: Fri Aug 21 17:26:54 2020 -0700

    make BJQP info in other-parsers a little more accurate while keeping it brief, including new section on 'Block Mask' concept
    
    add some nocommits to searching-nested-documents w/reminders of where/how to fill in details on non-trivial examples
---
 solr/solr-ref-guide/src/other-parsers.adoc         | 130 ++++++++++++---------
 .../src/searching-nested-documents.adoc            |  17 ++-
 2 files changed, 89 insertions(+), 58 deletions(-)

diff --git a/solr/solr-ref-guide/src/other-parsers.adoc b/solr/solr-ref-guide/src/other-parsers.adoc
index 788cdb3..5395938 100644
--- a/solr/solr-ref-guide/src/other-parsers.adoc
+++ b/solr/solr-ref-guide/src/other-parsers.adoc
@@ -24,40 +24,30 @@ Many of these parsers are expressed the same way as <<local-parameters-in-querie
 
 == Block Join Query Parsers
 
-// nocommit: of/which are a PITA to get right in deeply nested docs
-//
-// nocommit: https://issues.apache.org/jira/browse/SOLR-14383?focusedCommentId=17166339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17166339
-//
-// nocommit: the detailed docs on child/parent parsers need to explain exactly how these should be used
-// nocommit: so people can understand them (and so other sections of the ref-guide can link here for
-// nocommit: actual explanaiton)
-
-// nocommit: find a way to write up a susincet & clear description of the process/hueristic demonstrated in
-// nocommit: TestNestedUpdateProcessor.testRandomNestPathQueryFiltering
-// nocommit: AKA: the "manual" form of the syntactic sugar proposed in SOLR-14687
-
 There are two query parsers that support block joins. These parsers allow indexing and searching for relational content that has been <<indexing-nested-documents.adoc#indexing-nested-documents, indexed as Nested Documents>>.
 
-The example usage of the query parsers below assumes these two documents and each of their child documents have been indexed:
+The example usage of the query parsers below assumes the following documents have been indexed:
 
 [source,xml]
 ----
 <add>
   <doc>
-  <field name="id">1</field>
-  <field name="title">Solr has block join support</field>
-  <field name="content_type">parentDocument</field>
+    <field name="id">1</field>
+    <field name="content_type">parent</field>
+    <field name="title">Solr has block join support</field>
     <doc>
       <field name="id">2</field>
+      <field name="content_type">child</field>
       <field name="comments">SolrCloud supports it too!</field>
     </doc>
   </doc>
   <doc>
     <field name="id">3</field>
+    <field name="content_type">parent</field>
     <field name="title">New Lucene and Solr release</field>
-    <field name="content_type">parentDocument</field>
     <doc>
       <field name="id">4</field>
+      <field name="content_type">child</field>
       <field name="comments">Lots of new features</field>
     </doc>
   </doc>
@@ -66,21 +56,23 @@ The example usage of the query parsers below assumes these two documents and eac
 
 === Block Join Children Query Parser
 
-This parser takes a query that matches some parent documents and returns their children.
+This parser wraps a query that matches some parent documents and returns the children of those documents.
 
-The syntax for this parser is: `q={!child of=<allParents>}<someParents>`.
+The syntax for this parser is: `q={!child of=<blockMask>}<someParents>`.
 
-The parameter `allParents` is a filter that matches *only parent documents*; here you would define the field and value that you used to identify *all parent documents*.
+* The inner subordinate query string (`someParents`) must be a query that will match some parent documents
+* The `of` parameter must be a query string to use as a <<#block-mask,Block Mask>> -- typically a query that matches the set of all possible parent documents
 
-The parameter `someParents` identifies a query that will match some of the parent documents. The output is the children.
+The resulting query will match all documents which do _not_ match the `<blockMask>` query and are children (or descendents) of the documents matched by `<someParents>`.
 
-Using the example documents above, we can construct a query such as `q={!child of="content_type:parentDocument"}title:lucene&wt=xml`. We only get one document in response:
+Using the example documents above, we can construct a query such as `q={!child of="content_type:parent"}title:lucene`. We only get one document in response:
 
 [source,xml]
 ----
 <result name="response" numFound="1" start="0">
   <doc>
     <str name="id">4</str>
+    <arr name="content_type"><str>child</str></arr>
     <str name="comments">Lots of new features</str>
   </doc>
 </result>
@@ -88,12 +80,12 @@ Using the example documents above, we can construct a query such as `q={!child o
 
 [CAUTION]
 ====
-The query for `someParents` should match only parent documents passed by `allParents` or you may get an exception:
+The query for `someParents` *MUST* match a strict subset of the documents matched by the <<#block-mask,Block Mask>> or your query may result in an Error:
 
 [literal]
 Parent query must not match any docs besides parent filter. Combine them as must (+) and must-not (-) clauses to find a problem doc.
 
-You can search for `q=+(someParents) -(allParents)` to find a cause if you encounter this error.
+You can search for `q=+(someParents) -(blockMask)` to find a cause if you encounter this type of error.
 ====
 
 ==== Filtering and Tagging
@@ -101,78 +93,72 @@ You can search for `q=+(someParents) -(allParents)` to find a cause if you encou
 `{!child}` also supports `filters` and `excludeTags` local parameters like the following:
 
 [source,text]
-{!child of=<allParents> filters=$parentfq excludeTags=certain}<someParents>&parentfq=BRAND:Foo&parentfq=NAME:Bar&parentfq={!tag=certain}CATEGORY:Baz
+?q={!child of=<blockMask> filters=$parentfq excludeTags=certain}<someParents>
+&parentfq=BRAND:Foo
+&parentfq=NAME:Bar
+&parentfq={!tag=certain}CATEGORY:Baz
 
 This is equivalent to:
 
 [source,text]
-{!child of=<allParents>}+<someParents> +BRAND:Foo +NAME:Bar
+q={!child of=<blockMask>}+<someParents> +BRAND:Foo +NAME:Bar
 
 Notice "$" syntax in `filters` for referencing queries; comma-separated tags `excludeTags` allows to exclude certain queries by tagging. Overall the idea is similar to <<faceting.adoc#tagging-and-excluding-filters, excluding fq in facets>>. Note, that filtering is applied to the subordinate clause (`<someParents>`), and the intersection result is joined to the children.
 
 ==== All Children Syntax
 
-When subordinate clause (`<someParents>`) is omitted, it's parsed as a _segmented_ and _cached_ filter for children documents. More precisely, `q={!child of=<allParents>}` is equivalent to `q=\*:* -<allParents>`.
+When subordinate clause (`<someParents>`) is omitted, it's parsed as a _segmented_ and _cached_ filter for children documents. More precisely, `q={!child of=<blockMask>}` is equivalent to `q=\*:* -<blockMask>`.
 
 === Block Join Parent Query Parser
 
 This parser takes a query that matches child documents and returns their parents.
 
-The syntax for this parser is similar: `q={!parent which=<allParents>}<someChildren>`.
-
-The parameter `allParents` is a filter that matches *only parent documents*; here you would define the field and value that you used to identify *all parent documents*.
+The syntax for this parser is similar to the `child` parser: `q={!parent which=<blockMask>}<someChildren>`.
 
-The parameter `someChildren` is a query that matches some or all of the child documents.
-
-[CAUTION]
-====
-The query for `someChildren` should match only child documents or you may get an exception:
-
-[literal]
-Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc.
+* The inner subordinate query string (`someChildren`) must be a query that will match some child documents
+* The `which` parameter must be a query string to use as a <<#block-mask,Block Mask>> -- typically a query that matches the set of all possible parent documents
 
-You can search for `q=+(parentFilter) +(someChildren)` to find a cause.
-====
+The resulting query will match all documents which _do_ match the `<blockMask>` query and are parents (or ancestors) of the documents matched by `<someChildren>`.
 
-Again using the example documents above, we can construct a query such as `q={!parent which="content_type:parentDocument"}comments:SolrCloud&wt=xml`. We get this document in response:
+Again using the example documents above, we can construct a query such as `q={!parent which="content_type:parent"}comments:SolrCloud`. We get this document in response:
 
 [source,xml]
 ----
 <result name="response" numFound="1" start="0">
   <doc>
     <str name="id">1</str>
+    <arr name="content_type"><str>parent</str></arr>
     <arr name="title"><str>Solr has block join support</str></arr>
-    <arr name="content_type"><str>parentDocument</str></arr>
   </doc>
 </result>
 ----
 
-.Using which
-[WARNING]
-====
-A common mistake is to try to filter parents with a `which` filter, as in this bad example:
 
-`q={!parent which="*title:join*"}comments:SolrCloud`
+[CAUTION]
+====
+The query for `someChildren` *MUST NOT* match any documents matched by the <<#block-mask,Block Mask>> or your query may result in an Error:
 
-Instead, you should use a sibling mandatory clause as a filter:
+[literal]
+Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc.
 
-`q= *+title:join* +{!parent which="*content_type:parentDocument*"}comments:SolrCloud`
+You can search for `q=+(blockMask) +(someChildren)` to find a cause.
 ====
 
+
 ==== Filtering and Tagging
 
 The `{!parent}` query supports `filters` and `excludeTags` local parameters like the following:
 
 [source,text]
-{!parent which=<allParents> filters=$childfq excludeTags=certain}<someChildren>&
-childfq=COLOR:Red&
-childfq=SIZE:XL&
-childfq={!tag=certain}PRINT:Hatched
+?q={!parent which=<blockMask> filters=$childfq excludeTags=certain}<someChildren>
+&childfq=COLOR:Red
+&childfq=SIZE:XL
+&childfq={!tag=certain}PRINT:Hatched
 
 This is equivalent to:
 
 [source,text]
-{!parent which=<allParents>}+<someChildren> +COLOR:Red +SIZE:XL
+q={!parent which=<blockMask>}+<someChildren> +COLOR:Red +SIZE:XL
 
 Notice the "$" syntax in `filters` for referencing queries. Comma-separated tags in `excludeTags` allow excluding certain queries by tagging. Overall the idea is similar to <<faceting.adoc#tagging-and-excluding-filters, excluding fq in facets>>. Note that filtering is applied to the subordinate clause (`<someChildren>`) first, and the intersection result is joined to the parents.
 
@@ -182,7 +168,41 @@ You can optionally use the `score` local parameter to return scores of the subor
 
 ==== All Parents Syntax
 
-When subordinate clause (`<someChildren>`) is omitted, it's parsed as a _segmented_ and _cached_ filter for all parent documents, or more precisely `q={!parent which=<allParents>}` is equivalent to `q=<allParents>`.
+When subordinate clause (`<someChildren>`) is omitted, it's parsed as a _segmented_ and _cached_ filter for all parent documents, or more precisely `q={!parent which=<blockMask>}` is equivalent to `q=<blockMask>`.
+
+[#block-mask]
+=== Block Masks: The `of` and `which` local params
+
+The purpose of the "Block Mask" query specified as either an `of` or `which` param (depending on the parser used) is to identy the set of all documents in the index which should be treated as "parents" _(or their ancestors)_ and which documents should be treated as "children".  This is important because in the "on disk" index, the relationships are flattened into "blocks" of documents, so the `of` / `which` params are needed to serve as a "mask" against the flat document blocks to identi [...]
+
+In the example queries above, we were able to use a very simple Block Mask of `doc_type:parent` because our data is very simple: every document is either a `parent` or a `child`  So this query string easily distinguishes _all_ of our documents.
+
+A common mistake is to try and use a `which` parameter that is more restrictive then the set of all parent documents, in order to filter the parents that are matched, as in this bad example:
+
+----
+// BAD! DO NOT USE!
+q={!parent which="title:join"}comments:support
+----
+
+This type of query will frequenly not work the way you might expect.  Since the `which` param only identifies _some_ of the "parent" documents, the resulting query can match "parent" documents it should not, because it will mistakenly identify all documents which do _not_ match the `which="title:join"` Block Mask as children of the next "parent" document in the index (that does match this Mask).
+
+A similar problematic situation can arise when mixing parent/child documents with "simple" documents that have no children _and do not match the query used to identify 'parent' documents_.  For example, if we add the following document to our existing parent/child example documents...
+
+[source,xml]
+----
+<add>
+  <doc>
+    <field name="id">0</field>
+    <field name="content_type">plain</field>
+    <field name="title">Lucene and Solr are cool</field>
+  </doc>
+</add>
+----
+
+...then our simple `doc_type:parent` Block Mask would no longer be adequate.  We would instead need to use `\*:* -doc_type:child` or `doc_type:(simple parent)` to prevent our "simple" document from mistakenly being treated as a "child" of an adjacent "parent" document.
+
+The <<searching-nested-documents#searching-nested-documents,Searching Nested Documents>> section contains more detailed examples of specifing Block Mask queries with non trivial hierarchicies of documents.
+
 
 == Boolean Query Parser
 
diff --git a/solr/solr-ref-guide/src/searching-nested-documents.adoc b/solr/solr-ref-guide/src/searching-nested-documents.adoc
index 057d6d4..09e16f8 100644
--- a/solr/solr-ref-guide/src/searching-nested-documents.adoc
+++ b/solr/solr-ref-guide/src/searching-nested-documents.adoc
@@ -108,7 +108,7 @@ Let's consider again the `description_t:staplers` query used above -- if we wrap
 
 [source,bash]
 ----
-$ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q={!child+of="*:*+-_nest_path_:*"}description_t:staplers'
+$ curl 'http://localhost:8983/solr/gettingstarted/select' -d omitHeader=true -d 'q={!child+of="*:* -_nest_path_:*"}description_t:staplers'
 {
   "response":{"numFound":5,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
       {
@@ -139,12 +139,18 @@ $ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=tr
   }}
 ----
 
-NOTE: The `of` local param is neccessary to tell the `{!child}` parser the set of _all_ "ancestor" documents to consider when looking for matching children.  In this example we've used `\*:* -\_nest_path_:*` to indicate we want to consider all documents which don't have a nest path field -- ie: all "root" level document.  When dealing with multiple levels of nested documents, it can be very tricky to define a correct `of` param -- see the <<other-parsers.adoc#block-join-children-query-pa [...]
+In this example we've used `\*:* -\_nest_path_:*` as our <<other-parsers#block-mask,`of` parameter>> to indicate we want to consider all documents which don't have a nest path -- ie: all "root" level document -- as the set of possible parents.
+
+nocommit: example with more interesting `of` param - ie: only manuals that are attached to SKUs
+
+nocommit: show both the "inline" nest path (explain escaping) and the param deref using "prefix" parser
 
 === Parent Query Parser
 
 The inverse of the `{!child}` query parser is the `{!parent}` query parser, which let's you search for the _ancestor_ documents of some child documents matching a wrapped query.  For a detailed explanation of this parser, see the section <<other-parsers.adoc#block-join-parent-query-parser,Block Join Parent Query Parser>>.
 
+nocommit: change this example to a query that matches "manuals"...
+
 Let's first consider this example of searching for all "sku" type documents that have a color of "RED"...
 
 [source,bash]
@@ -165,8 +171,12 @@ $ curl 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=color
   }}
 ----
 
+nocommit: change this query to match all "ancestors" of the above query (products & skus)
+
 We can wrap that query in a `{!parent}` query to return the details of all products that have "RED" skus...
 
+nocommit: switch curl command to use `-d` for readability...
+
 [source,bash]
 ----
 $ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q={!parent+which="*:*+-_nest_path_:*"}color_s:RED'
@@ -185,8 +195,9 @@ $ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=tr
   }}
 ----
 
+In this example we've used `\*:* -\_nest_path_:*` as our <<other-parsers#block-mask,`which` parameter>> to indicate we want to consider all documents which don't have a nest path -- ie: all "root" level document -- as the set of possible parents.
 
-NOTE: The `which` local param of the `{!parent}` parser serves the same purpse as the `{!child}` parser's `of` param: to define the set of _all_ "ancestor" documents to consider for matching based on their children.  In this example we've again used `\*:* -\_nest_path_:*` to indicate we want to consider all documents which don't have a nest path field -- ie: all "root" level document.  When dealing with multiple levels of nested documents, it can be very tricky to define a correct `which [...]
+nocommit: now give a more interesting example, using which to only match the "sku" parents