You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ch...@apache.org on 2017/07/17 09:12:59 UTC
svn commit: r1802112 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md
Author: chetanm
Date: Mon Jul 17 09:12:58 2017
New Revision: 1802112
URL: http://svn.apache.org/viewvc?rev=1802112&view=rev
Log:
OAK-6081 - Indexing tooling via oak-run
Add toc
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md?rev=1802112&r1=1802111&r2=1802112&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md Mon Jul 17 09:12:58 2017
@@ -14,7 +14,27 @@
See the License for the specific language governing permissions and
limitations under the License.
-->
-# Oak Run Indexing
+# <a name="oak-run-indexing"></a> Oak Run Indexing
+
+* [Oak Run Indexing](#oak-run-indexing)
+ * [Common Options](#common-options)
+ * [Generate Index Info](#index-info)
+ * [Dump Index Definitions](#dump-index-defn)
+ * [Dump Index Data](#async-index-data)
+ * [Index Consistency Check](#check-index)
+ * [Reindex](#reindex)
+ * [A - out-of-band indexing](#out-of-band-indexing)
+ * [Step 1 - Text PreExtraction](#out-of-band-pre-extraction)
+ * [Step 2 - Create Checkpoint](#out-of-band-create-checkpoint)
+ * [Step 3 - Perform Reindex](#out-of-band-perform-reindex)
+ * [Step 4 - Import the index](#out-of-band-import-reindex)
+ * [4.1 - Via oak-run](#import-index-oak-run)
+ * [4.2 - Via IndexerMBean](#import-index-mbean)
+ * [4.3 - Via script](#import-index-script)
+ * [B - Online indexing](#online-indexing)
+ * [Step 1 - Text PreExtraction](#online-indexing-pre-extract)
+ * [Step 2 - Perform reindexing](#online-indexing-perform-reindex)
+ * [Tika Setup](#tika-setup)
`@since Oak 1.7.0`
@@ -31,7 +51,7 @@ By default the tool would generate outpu
Unless specified all operations connect to the repository in read only mode
-## Common Options
+## <a name="common-options"></a> Common Options
All the commands support following common options
@@ -40,7 +60,7 @@ All the commands support following commo
Also refer to help output via `-h` command for some other options
-## Generate Index Info
+## <a name="index-info"></a> Generate Index Info
java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-info
@@ -49,7 +69,7 @@ report is stored by default in `<output
Supported for all index types
-## Dump Index Definitions
+## <a name="dump-index-defn"></a> Dump Index Definitions
java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-definitions
@@ -58,7 +78,7 @@ file contains index definitions keyed ag
Supported for all index types
-## Dump Index Data
+## <a name="async-index-data"></a> Dump Index Data
java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-dump
@@ -67,7 +87,7 @@ each index. Each folder would have a pro
Supported for only Lucene indexes.
-## Index Consistency Check
+## <a name="check-index"></a> Index Consistency Check
java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-consistency-check
@@ -82,7 +102,7 @@ It would generate a report in `<output d
Supported for only Lucene indexes.
-## Reindex
+## <a name="reindex"></a> Reindex
The reindex operation supports 2 modes of index
@@ -94,7 +114,7 @@ Supported for only Lucene indexes.
If the indexes being reindex have fulltext indexing enabled then refer to [Tika Setup](#tika-setup) for steps
on how to adapt the command to include Tika support for text extraction
-### A - out-of-band indexing
+### <a name="out-of-band-indexing"></a> A - out-of-band indexing
Out of band indexing has following phases
@@ -104,17 +124,17 @@ Out of band indexing has following phase
4. Complete the increment indexing from checkpoint state to current head
-#### Step 1 - Text PreExtraction
+#### <a name="out-of-band-pre-extraction"></a> Step 1 - Text PreExtraction
If the index being reindexed involves fulltext index and the repository has binary content then its recommended
that first [text pre-extraction](pre-extract-text.html) is performed. This ensures that costly operation around text
extraction is done prior to actual indexing so that actual indexing does not do text extraction in critical path
-#### Step 2 - Create Checkpoint
+#### <a name="out-of-band-create-checkpoint"></a>Step 2 - Create Checkpoint
Go to `CheckpointMBean` and create a checkpoint with lifetime of 1 month. <<TBD>>
-#### Step 3 - Perform Reindex
+#### <a name="out-of-band-perform-reindex"></a> Step 3 - Perform Reindex
In this step we perform the actual indexing via oak-run where it connects to repository in read only mode.
@@ -127,12 +147,12 @@ Here following options can be used
* `--checkpoint` - The checkpoint up to which the index is updated, when indexing in read only mode. For
testing purpose, it can be set to 'head' to indicate that the head state should be used.
-#### Step 4 - Import the index
+#### <a name="out-of-band-import-reindex"></a>Step 4 - Import the index
As a last step we need to import the index back in the repository. This can be done in one of the
following ways
-##### 4.1 - Via oak-run
+##### <a name="import-index-oak-run"></a>4.1 - Via oak-run
In this mode we import the index using oak-run
@@ -144,28 +164,28 @@ command for the directory path.
This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in
read-write mode.
-##### 4.2 - Via IndexerMBean
+##### <a name="import-index-mbean"></a>4.2 - Via IndexerMBean
In this mode we import the index using JMX. Looks for `IndexerMBean` and then import the index directory using the
`importIndex` operation
-##### 4.3 - Via script
+##### <a name="import-index-script"></a>4.3 - Via script
TODO - Provide a way to import the data on older setup using some script
-### B - Online indexing
+### <a name="online-indexing"></a>B - Online indexing
Online indexing automates some of the manual steps which are required for out-of-band indexing.
This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in
read-write mode.
-#### Step 1 - Text PreExtraction
+#### <a name="online-indexing-pre-extract"></a>Step 1 - Text PreExtraction
This is same as in out-of-band indexing
-#### Step 2 - Perform reindexing
+#### <a name="online-indexing-perform-reindex"></a>Step 2 - Perform reindexing
In this step we configure oak-run to connect to repository in read-write mode and let it perform all other steps i.e
checkpoint creation, indexing and import