You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "David Gonzalez (JIRA)" <ji...@apache.org> on 2017/02/17 21:17:41 UTC

[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

    [ https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872536#comment-15872536 ] 

David Gonzalez edited comment on OAK-5707 at 2/17/17 9:16 PM:
--------------------------------------------------------------

Including helpful offline conversations w/ Vikas. 

The following require review for correctness, and are added here to help shape the discussion and for convenience and should NOT be considered correct until the review has been finalized.

* Aggregate instruct Oak to  fulltext-index any property found under the provided path pattern (ex. */*/*) (avoiding complication of how it recurses through types... )
** By default all String and String[] properties are candidates for aggregation, however other property types can be specified at the aggregation  level.
* Specific property index definitions defined under indexRules, are about "how to index a specific property"
** The effected property is defined via a) a relative property path or b) via a regex property path match from the <nodeType> the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that property having a `indexRules/<nodeType>/properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an aggregate are:
*** The property can also be marked as a propertyIndex which allows for more performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, useInSpellecheck, boost, etc.) which may (depending on the applied special use properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE [jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as noted are ~ equivalent), equality property conditions (WHERE [jcr:content/jcr:title]='foo') may still appear fast and not report a traversal warning, as Oak is able to leverage the internal aggregate index to quickly isolate matches. That being said, for property equality checks, it is always as fast (if not faster) to defined an indexRule for the property with `propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's "nodeType" index, so that content will be candidate for fulltext searchs against that node (vs against a specific property) or rather: `WHERE CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop that matches a prop def having `excludeFromAggregation`



was (Author: empire29):
Including helpful offline conversations w/ Vikas.

The following require review for correctness, and are added here to help shape the discussion and for convenience.

* Aggregate instruct Oak to  fulltext-index any property found under the provided path pattern (ex. */*/*) (avoiding complication of how it recurses through types... )
** By default all String and String[] properties are candidates for aggregation, however other property types can be specified at the aggregation  level.
* Specific property index definitions defined under indexRules, are about "how to index a specific property"
** The effected property is defined via a) a relative property path or b) via a regex property path match from the <nodeType> the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that property having a `indexRules/<nodeType>/properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an aggregate are:
*** The property can also be marked as a propertyIndex which allows for more performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, useInSpellecheck, boost, etc.) which may (depending on the applied special use properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE [jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as noted are ~ equivalent), equality property conditions (WHERE [jcr:content/jcr:title]='foo') may still appear fast and not report a traversal warning, as Oak is able to leverage the internal aggregate index to quickly isolate matches. That being said, for property equality checks, it is always as fast (if not faster) to defined an indexRule for the property with `propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's "nodeType" index, so that content will be candidate for fulltext searchs against that node (vs against a specific property) or rather: `WHERE CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop that matches a prop def having `excludeFromAggregation`


> [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5707
>                 URL: https://issues.apache.org/jira/browse/OAK-5707
>             Project: Jackrabbit Oak
>          Issue Type: Documentation
>            Reporter: David Gonzalez
>            Assignee: Vikas Saurabh
>
> Oak lucene documentation would benefit from clarifying the relationships and expect behaviors around aggregates, nodeScopeIndex, propertyIndex and analyzed.
> These features have some overlap in what they do and/or augment one another, but to the lay-developer it is unclear how these work in concern and/or the implications of these using the various features.
> Its worth remembering many developers are under the mindset (shifting from jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content into search results; thus implicit content inclusion into indexes via generalized aggregations (vs named properties) is unclear/unexpected to many.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)