You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/04/27 06:55:38 UTC

[jira] [Updated] (OAK-2599) Allow excluding certain paths from getting indexed for particular index

     [ https://issues.apache.org/jira/browse/OAK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chetan Mehrotra updated OAK-2599:
---------------------------------
    Attachment: OAK-2599-1.patch

[proposed patch|^OAK-2599-1.patch] based on following approach

Adds support for a {{PathFilter}}  which can decide what to do with a given path base don a set of includes and excludes. It can result in either
* INCLUDE - The given path needs to be included and hence indexed
* EXCLUDE - Given path and its subtree must be excluded from procesisng and hence not indexed
* TRAVERSE - Given path should be traversed but not indexed. Later some path down under might be actually part of some include and would the be indexed. One place where this would be used is say if your include consist of /a/b and /a/c. Then for /a editor must just traverse and not index while actual indexing would be done for /a/b and /a/c only

It makes use of config from index config
{noformat}
/oak:index/foo
  - jcr:primaryType = "oak:QueryIndexDefinition"
 - includedPaths (string) multiple
 - excludedPaths (string) multiple
{noformat}

Where 
* {{includedPaths}} - Multi value property indicating set of path which should be included for indexing. 
* {{excludedPaths}} - Multi value property indicating set of path which should NOT be included for indexing

* Both fields are option - If none provided then default [includedPaths: '/' , excludedPaths: ""] is used
* PathFilter has to be used by specific index implementations. For now LuceneIndexEditor would make use of this 

*Included Paths*
By default the recommended way to control which paths are included is to place the index definition under given path itself. For e.g. if you want that only nodes under {{/content/en}} should be indexed then you can achieve that by creating the index definition under {{/content/en/oak:index/<index>}}. However this requires that queries also make use of path restrictions for such an index to be picked up.

With this patch one can provide set of path to be included by config. For e.g. you can create index definition under /oak:index and just want to index nodes under /lib and /apps then its not possible with previous approach. That can now be done by providing set of path to be indexed and then only nodes under those paths would be indexed

*Benefits*
* Editor would avoid processing the diff for paths not of interest
* One can exclude paths which a user knows are not of interest to some indexes. This would help in processing writes happening in those paths

[~alexparvulescu] [~tmueller] Can you review the patch

[~mduerig] This patch is nit based on the approach used for filtering in event processing. So if you can also have a look it would be helpful!

> Allow excluding certain paths from getting indexed for particular index
> -----------------------------------------------------------------------
>
>                 Key: OAK-2599
>                 URL: https://issues.apache.org/jira/browse/OAK-2599
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core
>            Reporter: Chetan Mehrotra
>             Fix For: 1.3.0
>
>         Attachments: OAK-2599-1.patch
>
>
> Currently an {{IndexEditor}} gets to index all nodes under the tree where it is defined (post OAK-1980).  Due to this IndexEditor would traverse the whole repo (or subtree if configured in non root path) to perform reindex. Depending on the repo size this process can take quite a bit of time. It would be faster if an IndexEditor can exclude certain paths from traversal
> Consider an application like Adobe AEM and an index which only index dam:Asset or the default full text index. For a fulltext index it might make sense to avoid indexing the versionStore. So if the index editor skips such path then lots of redundant traversal can be avoided. 
> Also see http://markmail.org/thread/4cuuicakagi6av4v



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)