You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Dirk Rudolph <di...@netcentric.biz> on 2017/12/27 09:17:52 UTC

OAK-7109: Facet counting for complex queries - example

Hi devs, Vikas Saurabh in particular, 

In OAK-7109 Vikas asked to provide some further infos about the query that require facet counting we use in our project. So here we go:

select s.[jcr:score], s.[jcr:path], [rep:facet(jcr:content/editorial/cq:tags)], [rep:facet(jcr:content/maincategories/cq:tags)]  from [cq:Page] as s where (contains(s.[*],'news') and (isdescendantnode(s,'/content/mam/web/de/en') or (s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/newslibrary') and isdescendantnode(s,'/content/mam/web/gc/news/en') and ((s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or (s.[jcr:content/countries/selectedCountries] = 'false' and not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or (s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] is null))) or (isdescendantnode(s,'/content/mam/web/gc/help/en') and s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/helplibrary')) or ((isdescendantnode(s,'/content/mam/web/gc/partners/air/en') or isdescendantnode(s,'/content/mam/web/gc/partners/non-air/en')) and s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/partnerlibrary','mam/web/pagetypes/airline-partnerlibrary') and ((s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or (s.[jcr:content/countries/selectedCountries] = 'false' and not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or (s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] is null))))) order by s.[jcr:score] desc

Resulting in the following execution plan:

[cq:Page] as [s] /* lucene:mamcom_lucene(/oak:index/mamcom_lucene) :fulltext:news ordering:[{ propertyName : jcr:score, propertyType : UNDEFINED, order : DESCENDING }] ft:("news") where contains([s].[*], 'news') */

With the following stored index definition

/{jcr:primaryType = oak:QueryIndexDefinition, compatVersion = 2, :version = 2, :source-path = /oak:index/Copy of cqPageLucene, costPerExecution = 0, type = lucene, async = [async, nrt], evaluatePathRestrictions = true, excludedPaths = [/var, /etc/replication, /etc/workflow/instances, /jcr:system], reindex = true, reindexCount = 13}
    aggregates{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page, nt:file, cq:PageContent]}
      nt:file{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
        include0{jcr:primaryType = nt:unstructured, path = jcr:content, :childOrder = []}
      cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
        include0{jcr:primaryType = nt:unstructured, relativeNode = true, path = jcr:content, :childOrder = []}
      cq:PageContent{jcr:primaryType = nt:unstructured, :childOrder = [include0, include1, include2, include3]}
        include3{jcr:primaryType = nt:unstructured, path = */*/*/*, :childOrder = []}
        include0{jcr:primaryType = nt:unstructured, path = *, :childOrder = []}
        include1{jcr:primaryType = nt:unstructured, path = */*, :childOrder = []}
        include2{jcr:primaryType = nt:unstructured, path = */*/*, :childOrder = []}
    facets{jcr:primaryType = nt:unstructured, topChildren = 1000, secure = false}
      jcr:content{jcr:primaryType = nt:unstructured, multivalued = true}
        editorial{jcr:primaryType = nt:unstructured, multivalued = true}
          cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
        maincategories{jcr:primaryType = nt:unstructured, multivalued = true}
          cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
    indexRules{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page]}
      cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [properties]}
        properties{jcr:primaryType = nt:unstructured, :childOrder = [slingResourceType, editorialTags, mainCategoriesTags, jcrTitle, jcrDescription, systemprops, props, selectedCountries, countryTags]}
          mainCategoriesTags{jcr:primaryType = nt:unstructured, facets = true, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of editorialTags, propertyIndex = true, stored = true, name = jcr:content/maincategories/cq:tags, :childOrder = []}
          systemprops{jcr:primaryType = nt:unstructured, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of props, isRegexp = true, name = ^(cq|jcr|sling):.+$, index = false, :childOrder = []}
          jcrDescription{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of jcrTitle, name = jcr:content/jcr:title, type = String, :childOrder = []}
          jcrTitle{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, name = jcr:content/jcr:title, type = String, :childOrder = []}
          countryTags{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/countries/cq:tags, :childOrder = []}
          props{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, analyzed = true, isRegexp = true, name = ^[^\/]*$, :childOrder = []}
          selectedCountries{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/countries/selectedCountries, :childOrder = []}
          slingResourceType{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/sling:resourceType, :childOrder = []}
          editorialTags{jcr:primaryType = nt:unstructured, facets = true, propertyIndex = true, stored = true, name = jcr:content/editorial/cq:tags, :childOrder = []}

Thanks Vikas for your investigation so far. I agree in all what you wrote so far - post filtering for counting facets will probably be expensive. I don’t know why in that case not all constraints are passed to the index. Form what I have seen, the deep combinations of disjunctions, conjunctions and path constraints might be causing that. Unfortunately this query formulates some business logic we agreed on with the customer - so they are not target to be changed. 

In my naive assumption I would say that the fulltext constraint, if splitting into multiple queries will be part of any on the disjunctive components (or unions) and with that the queryNorm(q) according to https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html <https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html> will be the same for each of the queries. Property constraints and even path constraints could potentially be boosted to 0 to not have any impact on the score - anyway from what I could observe in our tests scores are, if coming from the same index, comparable across (similar) queries with the same fulltext constraint but different property constraints.

Cheers,
Dirk