You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2017/02/24 02:53:44 UTC

[jira] [Updated] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

     [ https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vikas Saurabh updated OAK-5707:
-------------------------------
    Attachment: OAK-5707.patch

In the spirit of laziness and rationalizing that I need to this before planning how to document: attaching [^OAK-5707.patch] which should have been a main class but test cases just have better utility methods - so, it's a test. 

It'd print 3 type of definitions and how the data is stored in the index. Current output is at \[0]. Index dump is of the form:
{noformat}
<fieldName1>
  <term1> => [<list of paths>]
  <term2> => [<list of paths>]
  ...
<fieldName2>
  ....
....
{noformat}

It's just 3 new files, so the patch should cleanly apply. [~empire29], you might want to check it out and see if this shows what is getting stored.

My next step is to add queries and their plans to the output. That should make it bit clearer how the index would be queried.

I hope with enough shuffling, I'd get to a point where relevant points could be documented succinctly.

PS: Somehow the content tree dump isn't following the order in which indices are present in content tree :-/. The real order of prop defs is {{foo}}, {{bar}}, {{allBar}}.

\[0]:
{noformat}
----------------CONTENT-------------------
+/test
  -foo = fox jumping
  +test1
    +testChild
      -bar = dog jumping
  +test2
    +testChild
      -barX = dog jumping
  +testChild
    -bar = dog jumping

----------------propIdx--------------
Definition
----------
+/oak:index/propIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -name = testChild/ba.*
          -propertyIndex = true
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -name = foo
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
        +bar
          -name = testChild/bar
          -propertyIndex = true
          -jcr:primaryType = nt:unstructured
Index
-----
foo
  fox jumping => [/test]
testChild/bar
  dog jumping => [/test/test1, /test]
testChild/barX
  dog jumping => [/test/test2]

----------------analyzedIdx--------------
Definition
----------
+/oak:index/analyzedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -analyzed = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -analyzed = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -analyzed = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
full:foo
  fox => [/test]
  jumping => [/test]
full:testChild/bar
  dog => [/test/test1, /test]
  jumping => [/test/test1, /test]
full:testChild/barX
  dog => [/test/test2]
  jumping => [/test/test2]

----------------nodeScopedIdx--------------
Definition
----------
+/oak:index/nodeScopedIdx
  -includedPaths = [/test]
  -reindexCount = 1
  -compatVersion = 2
  -reindex = false
  -type = lucene
  -jcr:primaryType = oak:QueryIndexDefinition
  +indexRules
    -jcr:primaryType = nt:unstructured
    +nt:base
      -jcr:primaryType = nt:unstructured
      +properties
        -jcr:primaryType = nt:unstructured
        +allBar
          -nodeScopeIndex = true
          -name = testChild/ba.*
          -isRegexp = true
          -jcr:primaryType = nt:unstructured
        +foo
          -nodeScopeIndex = true
          -name = foo
          -jcr:primaryType = nt:unstructured
        +bar
          -nodeScopeIndex = true
          -name = testChild/bar
          -jcr:primaryType = nt:unstructured
Index
-----
:fulltext
  dog => [/test/test1, /test/test2, /test]
  fox => [/test]
  jumping => [/test/test1, /test/test2, /test]
  test => [/test]
  test1 => [/test/test1]
  test2 => [/test/test2]
  testchild => [/test/test1/testChild, /test/test2/testChild, /test/testChild]
{noformat}

> [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5707
>                 URL: https://issues.apache.org/jira/browse/OAK-5707
>             Project: Jackrabbit Oak
>          Issue Type: Documentation
>            Reporter: David Gonzalez
>            Assignee: Vikas Saurabh
>         Attachments: OAK-5707.patch
>
>
> Oak lucene documentation would benefit from clarifying the relationships and expect behaviors around aggregates, nodeScopeIndex, propertyIndex and analyzed.
> These features have some overlap in what they do and/or augment one another, but to the lay-developer it is unclear how these work in concern and/or the implications of these using the various features.
> Its worth remembering many developers are under the mindset (shifting from jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content into search results; thus implicit content inclusion into indexes via generalized aggregations (vs named properties) is unclear/unexpected to many.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)