You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ch...@apache.org on 2017/03/21 12:25:14 UTC
svn commit: r1787952 - in
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query: index-nrt.png
indexing.md lucene.md
Author: chetanm
Date: Tue Mar 21 12:25:14 2017
New Revision: 1787952
URL: http://svn.apache.org/viewvc?rev=1787952&view=rev
Log:
OAK-5917 - Document enhancements in indexing in 1.6
OAK-4412 - Lucene hybrid index
Added:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/index-nrt.png (with props)
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/indexing.md
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md
Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/index-nrt.png
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/index-nrt.png?rev=1787952&view=auto
==============================================================================
Binary file - no diff available.
Propchange: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/index-nrt.png
------------------------------------------------------------------------------
svn:mime-type = image/png
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/indexing.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/indexing.md?rev=1787952&r1=1787951&r2=1787952&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/indexing.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/indexing.md Tue Mar 21 12:25:14 2017
@@ -33,7 +33,12 @@
* [Setup](#async-index-setup)
* [Async Indexing MBean](#async-index-mbean)
* [Isolating Corrupt Indexes](#corrupt-index-handling)
- * [Near Real Time Indexing](#nrt-indexing)
+ * [Near Real Time Indexing](#nrt-indexing)
+ * [NRT Indexing Modes](#nrt-indexing-modes)
+ * [nrt](#nrt-indexing-mode-nrt)
+ * [sync](#nrt-indexing-mode-sync)
+ * [Cluster Setup](#nrt-indexing-cluster-setup)
+ * [Configuration](#nrt-indexing-config)
## <a name="overview"></a> Overview
@@ -56,6 +61,7 @@ Oak has following in built `IndexEditor`
### <a name="new-1.6"></a> New in 1.6
+* [Near Real Time Indexing](#nrt-indexing)
* [Multiple Async indexers setup via OSGi config](#async-index-setup)
* [Isolating Corrupt Indexes](#corrupt-index-handling)
@@ -201,7 +207,7 @@ date
#### <a name="async-index-setup"></a> Setup
-`Since 1.6`
+`@since Oak 1.6`
Async indexers can be configure via OSGi config for `org.apache.jackrabbit.oak.plugins.index.AsyncIndexerService`
@@ -265,16 +271,104 @@ Later once the index is reindexed follow
This feature can be disabled by setting `failingIndexTimeoutSeconds` to 0 in AsyncIndexService config. Refer to
[OAK-4939][OAK-4939] for more details
-## <a name="nrt-indexing"></a> Near Real Time Indexing
+### <a name="nrt-indexing"></a> Near Real Time Indexing
+
+`@since Oak 1.6`
+
+_This mode is only supported for `lucene` indexes_
+
+Lucene indexes perform well for evaluating complex queries and also have the benefit of being evaluated locally with
+copy-on-read support. However they are `async` index and depending on system load can lag behind the repository state.
+For cases where such lag (of order of minutes) is not acceptable one has to use `property` indexes. For such cases
+Oak 1.6 has [added support for near real time indexing][OAK-4412]
+
+![NRT Index Flow](index-nrt.png)
+
+In this mode the indexing would happen in 2 modes and query would consult multiple indexes. The diagram above shows
+indexing flow with time. In above flow
+
+* T1, T3 and T5 - Time instances at which checkpoint is created
+* T2 and T4 - Time instance when async indexer run completed and indexes were updated
+* Persisted Index
+ * v2 - Index version v2 which has repository state upto time T1 indexed
+ * v3 - Index version v2 which has repository state upto time T3 indexed
+* Local Index
+ * NRT1 - Local index which repository state between time T2 and T4 indexed
+ * NRT2 - Local index which repository state between time T4 and T6 indexed
+
+As repository state changes with time Async indexer would run and index state between last known checkpoint and
+current state when that run started. So when asyn run 1 completed the persisted index has repository state indexed
+upto time T3.
+
+Now without NRT index support if any query is performed between time T2 and T4 it would only see index result for
+repository state at time T1 as thats state which the persisted indexes have data for. Any change after that would not be
+seen untill next async indexing cycle complete (by time T4).
+
+With NRT indexing support indexing would happen at 2 places
+
+* Persisted Index - This is the index which is updated via async indexer run. This flow would remain same i.e. it
+ would be periodically updated by the indexer run
+* Local Index - In addition to persisted index each cluster node would also maintain a local index. This index would
+ only keep data between 2 async indexer run. Post each run the previous index would be discarded and a new index would
+ be built (actually previous index is retained for one cycle)
+
+Any query making use of such an index would make use of both indexes. With this new content added in repository
+after the last async index run would also show up quickly.
+
+#### <a name="nrt-indexing-modes"></a> NRT Indexing Modes
+
+NRT indexing can be enabled for any index by configuring the `async` property
+
+ /oak:index/assetIndex
+ - jcr:primaryType = "oak:QueryIndexDefinition"
+ - async = ['fulltext-async', 'nrt']
+
+Here `async` value has been set to a multi value property where
+
+* Indexing lane - Like `async` or `fulltext-async`
+* NRT Indexing Mode - `nrt` or `sync`
+
+##### <a name="nrt-indexing-mode-nrt"></a> nrt
+
+In this mode the local index would be updated asynchronously on that cluster nodes post commit and the index reader
+would be refreshed after 1 sec. So any change done should should show up on that cluster node in 1-2 secs
+
+ /oak:index/userIndex
+ - jcr:primaryType = "oak:QueryIndexDefinition"
+ - async = ['async', 'nrt']
+
+##### <a name="nrt-indexing-mode-sync"></a> sync
+
+In this mode the local index would be updated synchronously on that cluster nodes post commit and the index reader
+would be refreshed immediately. This mode performs slowly compared to the "nrt" mode
+
+ /oak:index/userIndex
+ - jcr:primaryType = "oak:QueryIndexDefinition"
+ - async = ['async', 'sync']
+
+For a single node setup (like with SegmentNodeStore) this mode effectively makes async lucene index perform same as
+synchronous property indexes. However 'nrt' mode performs better so using that would be preferable
+
+#### <a name="nrt-indexing-cluster-setup"></a> Cluster Setup
+
+In cluster setup each cluster node would maintain its own local index for changes happening in that cluster node.
+In addition to that it would also index changes from other cluster node by relying on [Oak observation for external
+changes][OAK-4808]. This depends on how frequently external changes are delivered. Due to this even with NRT indexing
+changes from other cluster node would take some more time to reflect in query result compared to local changes.
-## Index Types
+#### <a name="nrt-indexing-config"></a> Configuration
-### Property Indexes
+NRT indexing expose few configuration options as part of [LuceneIndexProviderService](lucene.html#osgi-config)
-### Lucene Indexes
+* `enableHybridIndexing` - Boolean property defaults to `true`. Can be set to `false` to disable NRT indexing feature
+ completely
+* `hybridQueueSize` - Size of in memory queue used to hold Lucene documents for indexing in `nrt` mode. Default size is
+ 10000
[OAK-5159]: https://issues.apache.org/jira/browse/OAK-5159
[OAK-4939]: https://issues.apache.org/jira/browse/OAK-4939
+[OAK-4808]: https://issues.apache.org/jira/browse/OAK-4808
+[OAK-4412]: https://issues.apache.org/jira/browse/OAK-4412
\ No newline at end of file
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md?rev=1787952&r1=1787951&r2=1787952&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md Tue Mar 21 12:25:14 2017
@@ -32,6 +32,7 @@
* [Codec](#codec)
* [Boost and Search Relevancy](#boost)
* [Effective Index Definition](#stored-index-definition)
+* [Near Real Time Indexing](#nrt-indexing)
* [LuceneIndexProvider Configuration](#osgi-config)
* [Tika Config](#tika-config)
* [Mime type usage](#mime-type-usage)
@@ -119,6 +120,7 @@ The Lucene index needs to be configured
Following are the new features in 1.6 release
+* [Near Real Time Indexing](#nrt-indexing)
* [Effective Index Definition](#stored-index-definition)
### <a name="index-definition"></a> Index Definition
@@ -873,6 +875,12 @@ to true. Once disable any change in inde
Refer to [OAK-4400][OAK-4400] for more details.
+### <a name="nrt-indexing"></a> Near Real Time Indexing
+
+`@since Oak 1.6`
+
+Refer to [Near realtime indexing](indexing.html#nrt-indexing) for more details
+
### <a name="osgi-config"></a>LuceneIndexProvider Configuration
Some of the runtime aspects of the Oak Lucene support can be configured via OSGi
@@ -1378,7 +1386,9 @@ from property index in following aspects
are always synchronous and upto date.
So if in your usecase you need the latest result then prefer _Property Indexes_ over
- _Lucene Index_
+ _Lucene Index_. Oak 1.6 supports [Near Realtime Indexing](indexing.html#nrt-indexing)
+ which reduce the lag considerably. With this you should be able to use lucene indexing
+ for most cases
2. Lucene index cannot enforce uniqueness constraint - By virtue of it being asynchronous
it cannot enforce uniqueness constraint.