You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Dirk Rudolph <di...@netcentric.biz> on 2017/12/27 09:17:52 UTC
OAK-7109: Facet counting for complex queries - example
Hi devs, Vikas Saurabh in particular,
In OAK-7109 Vikas asked to provide some further infos about the query that require facet counting we use in our project. So here we go:
select s.[jcr:score], s.[jcr:path], [rep:facet(jcr:content/editorial/cq:tags)], [rep:facet(jcr:content/maincategories/cq:tags)] from [cq:Page] as s where (contains(s.[*],'news') and (isdescendantnode(s,'/content/mam/web/de/en') or (s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/newslibrary') and isdescendantnode(s,'/content/mam/web/gc/news/en') and ((s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or (s.[jcr:content/countries/selectedCountries] = 'false' and not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or (s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] is null))) or (isdescendantnode(s,'/content/mam/web/gc/help/en') and s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/helplibrary')) or ((isdescendantnode(s,'/content/mam/web/gc/partners/air/en') or isdescendantnode(s,'/content/mam/web/gc/partners/non-air/en')) and s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/partnerlibrary','mam/web/pagetypes/airline-partnerlibrary') and ((s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or (s.[jcr:content/countries/selectedCountries] = 'false' and not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or (s.[jcr:content/countries/selectedCountries] = 'true' and s.[jcr:content/countries/cq:tags] is null))))) order by s.[jcr:score] desc
Resulting in the following execution plan:
[cq:Page] as [s] /* lucene:mamcom_lucene(/oak:index/mamcom_lucene) :fulltext:news ordering:[{ propertyName : jcr:score, propertyType : UNDEFINED, order : DESCENDING }] ft:("news") where contains([s].[*], 'news') */
With the following stored index definition
/{jcr:primaryType = oak:QueryIndexDefinition, compatVersion = 2, :version = 2, :source-path = /oak:index/Copy of cqPageLucene, costPerExecution = 0, type = lucene, async = [async, nrt], evaluatePathRestrictions = true, excludedPaths = [/var, /etc/replication, /etc/workflow/instances, /jcr:system], reindex = true, reindexCount = 13}
aggregates{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page, nt:file, cq:PageContent]}
nt:file{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
include0{jcr:primaryType = nt:unstructured, path = jcr:content, :childOrder = []}
cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
include0{jcr:primaryType = nt:unstructured, relativeNode = true, path = jcr:content, :childOrder = []}
cq:PageContent{jcr:primaryType = nt:unstructured, :childOrder = [include0, include1, include2, include3]}
include3{jcr:primaryType = nt:unstructured, path = */*/*/*, :childOrder = []}
include0{jcr:primaryType = nt:unstructured, path = *, :childOrder = []}
include1{jcr:primaryType = nt:unstructured, path = */*, :childOrder = []}
include2{jcr:primaryType = nt:unstructured, path = */*/*, :childOrder = []}
facets{jcr:primaryType = nt:unstructured, topChildren = 1000, secure = false}
jcr:content{jcr:primaryType = nt:unstructured, multivalued = true}
editorial{jcr:primaryType = nt:unstructured, multivalued = true}
cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
maincategories{jcr:primaryType = nt:unstructured, multivalued = true}
cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
indexRules{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page]}
cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [properties]}
properties{jcr:primaryType = nt:unstructured, :childOrder = [slingResourceType, editorialTags, mainCategoriesTags, jcrTitle, jcrDescription, systemprops, props, selectedCountries, countryTags]}
mainCategoriesTags{jcr:primaryType = nt:unstructured, facets = true, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of editorialTags, propertyIndex = true, stored = true, name = jcr:content/maincategories/cq:tags, :childOrder = []}
systemprops{jcr:primaryType = nt:unstructured, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of props, isRegexp = true, name = ^(cq|jcr|sling):.+$, index = false, :childOrder = []}
jcrDescription{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, :source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of jcrTitle, name = jcr:content/jcr:title, type = String, :childOrder = []}
jcrTitle{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, name = jcr:content/jcr:title, type = String, :childOrder = []}
countryTags{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/countries/cq:tags, :childOrder = []}
props{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, analyzed = true, isRegexp = true, name = ^[^\/]*$, :childOrder = []}
selectedCountries{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/countries/selectedCountries, :childOrder = []}
slingResourceType{jcr:primaryType = nt:unstructured, propertyIndex = true, name = jcr:content/sling:resourceType, :childOrder = []}
editorialTags{jcr:primaryType = nt:unstructured, facets = true, propertyIndex = true, stored = true, name = jcr:content/editorial/cq:tags, :childOrder = []}
Thanks Vikas for your investigation so far. I agree in all what you wrote so far - post filtering for counting facets will probably be expensive. I don’t know why in that case not all constraints are passed to the index. Form what I have seen, the deep combinations of disjunctions, conjunctions and path constraints might be causing that. Unfortunately this query formulates some business logic we agreed on with the customer - so they are not target to be changed.
In my naive assumption I would say that the fulltext constraint, if splitting into multiple queries will be part of any on the disjunctive components (or unions) and with that the queryNorm(q) according to https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html <https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html> will be the same for each of the queries. Property constraints and even path constraints could potentially be boosted to 0 to not have any impact on the score - anyway from what I could observe in our tests scores are, if coming from the same index, comparable across (similar) queries with the same fulltext constraint but different property constraints.
Cheers,
Dirk