You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by ch...@apache.org on 2017/07/17 09:13:16 UTC
svn commit: r1802113 -
/jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
Author: chetanm
Date: Mon Jul 17 09:13:16 2017
New Revision: 1802113
URL: http://svn.apache.org/viewvc?rev=1802113&view=rev
Log:
Added toc
Modified:
jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
Modified: jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/oak-run-indexing.html?rev=1802113&r1=1802112&r2=1802113&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/oak-run-indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/oak-run-indexing.html Mon Jul 17 09:13:16 2017
@@ -9,7 +9,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20170717" />
<meta http-equiv="Content-Language" content="en" />
- <title>Jackrabbit Oak – Oak Run Indexing</title>
+ <title>Jackrabbit Oak – <a name="oak-run-indexing"></a> Oak Run Indexing</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
<link rel="stylesheet" href="../css/site.css" />
<link rel="stylesheet" href="../css/print.css" media="print" />
@@ -229,7 +229,63 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
- --><h1>Oak Run Indexing</h1>
+ --><h1><a name="oak-run-indexing"></a> Oak Run Indexing</h1>
+
+<ul>
+
+<li><a href="#oak-run-indexing">Oak Run Indexing</a>
+
+<ul>
+
+<li><a href="#common-options">Common Options</a></li>
+
+<li><a href="#index-info">Generate Index Info</a></li>
+
+<li><a href="#dump-index-defn">Dump Index Definitions</a></li>
+
+<li><a href="#async-index-data">Dump Index Data</a></li>
+
+<li><a href="#check-index">Index Consistency Check</a></li>
+
+<li><a href="#reindex">Reindex</a>
+
+<ul>
+
+<li><a href="#out-of-band-indexing">A - out-of-band indexing</a>
+
+<ul>
+
+<li><a href="#out-of-band-pre-extraction">Step 1 - Text PreExtraction</a></li>
+
+<li><a href="#out-of-band-create-checkpoint">Step 2 - Create Checkpoint</a></li>
+
+<li><a href="#out-of-band-perform-reindex">Step 3 - Perform Reindex</a></li>
+
+<li><a href="#out-of-band-import-reindex">Step 4 - Import the index</a>
+
+<ul>
+
+<li><a href="#import-index-oak-run">4.1 - Via oak-run</a></li>
+
+<li><a href="#import-index-mbean">4.2 - Via IndexerMBean</a></li>
+
+<li><a href="#import-index-script">4.3 - Via script</a></li>
+ </ul></li>
+ </ul></li>
+
+<li><a href="#online-indexing">B - Online indexing</a>
+
+<ul>
+
+<li><a href="#online-indexing-pre-extract">Step 1 - Text PreExtraction</a></li>
+
+<li><a href="#online-indexing-perform-reindex">Step 2 - Perform reindexing</a></li>
+ </ul></li>
+
+<li><a href="#tika-setup">Tika Setup</a></li>
+ </ul></li>
+ </ul></li>
+</ul>
<p><tt>@since Oak 1.7.0</tt></p>
<p><b>Work in progress. Not to be used on production setups</b></p>
<p>With Oak 1.7 we have added some tooling as part of oak-run <tt>index</tt> command. Below are details around various operations supported by this command.</p>
@@ -237,7 +293,7 @@
<p>By default the tool would generate output file in directory <tt>indexing-result</tt> which is referred to as output directory.</p>
<p>Unless specified all operations connect to the repository in read only mode</p>
<div class="section">
-<h2><a name="Common_Options"></a>Common Options</h2>
+<h2><a name="Common_Options"></a><a name="common-options"></a> Common Options</h2>
<p>All the commands support following common options</p>
<ol style="list-style-type: decimal">
@@ -246,7 +302,7 @@
</ol>
<p>Also refer to help output via <tt>-h</tt> command for some other options</p></div>
<div class="section">
-<h2><a name="Generate_Index_Info"></a>Generate Index Info</h2>
+<h2><a name="Generate_Index_Info"></a><a name="index-info"></a> Generate Index Info</h2>
<div class="source">
<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-info
@@ -254,7 +310,7 @@
<p>Generates a report consisting of various stats related to indexes present in the given repository. The generated report is stored by default in <tt><output dir>/index-info.txt</tt></p>
<p>Supported for all index types</p></div>
<div class="section">
-<h2><a name="Dump_Index_Definitions"></a>Dump Index Definitions</h2>
+<h2><a name="Dump_Index_Definitions"></a><a name="dump-index-defn"></a> Dump Index Definitions</h2>
<div class="source">
<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-definitions
@@ -262,7 +318,7 @@
<p><tt>--index-definitions</tt> operation dumps the index definition in json format to a file <tt><output dir>/index-definitions.json</tt>. The json file contains index definitions keyed against the index paths</p>
<p>Supported for all index types</p></div>
<div class="section">
-<h2><a name="Dump_Index_Data"></a>Dump Index Data</h2>
+<h2><a name="Dump_Index_Data"></a><a name="async-index-data"></a> Dump Index Data</h2>
<div class="source">
<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-dump
@@ -270,7 +326,7 @@
<p><tt>--index-dump</tt> operation dumps the index content in output directory. The output directory would contain one folder for each index. Each folder would have a property file <tt>index-details.txt</tt> which contains <tt>indexPath</tt></p>
<p>Supported for only Lucene indexes.</p></div>
<div class="section">
-<h2><a name="Index_Consistency_Check"></a>Index Consistency Check</h2>
+<h2><a name="Index_Consistency_Check"></a><a name="check-index"></a> Index Consistency Check</h2>
<div class="source">
<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore /path/to/segmentstore/ --index-consistency-check
@@ -286,7 +342,7 @@
<p>It would generate a report in <tt><output dir>/index-consistency-check-report.txt</tt></p>
<p>Supported for only Lucene indexes.</p></div>
<div class="section">
-<h2><a name="Reindex"></a>Reindex</h2>
+<h2><a name="Reindex"></a><a name="reindex"></a> Reindex</h2>
<p>The reindex operation supports 2 modes of index</p>
<ul>
@@ -298,7 +354,7 @@
<p>Supported for only Lucene indexes.</p>
<p>If the indexes being reindex have fulltext indexing enabled then refer to <a href="#tika-setup">Tika Setup</a> for steps on how to adapt the command to include Tika support for text extraction</p>
<div class="section">
-<h3><a name="A_-_out-of-band_indexing"></a>A - out-of-band indexing</h3>
+<h3><a name="A_-_out-of-band_indexing"></a><a name="out-of-band-indexing"></a> A - out-of-band indexing</h3>
<p>Out of band indexing has following phases</p>
<ol style="list-style-type: decimal">
@@ -312,13 +368,13 @@
<li>Complete the increment indexing from checkpoint state to current head</li>
</ol>
<div class="section">
-<h4><a name="Step_1_-_Text_PreExtraction"></a>Step 1 - Text PreExtraction</h4>
+<h4><a name="Step_1_-_Text_PreExtraction"></a><a name="out-of-band-pre-extraction"></a> Step 1 - Text PreExtraction</h4>
<p>If the index being reindexed involves fulltext index and the repository has binary content then its recommended that first <a href="pre-extract-text.html">text pre-extraction</a> is performed. This ensures that costly operation around text extraction is done prior to actual indexing so that actual indexing does not do text extraction in critical path</p></div>
<div class="section">
-<h4><a name="Step_2_-_Create_Checkpoint"></a>Step 2 - Create Checkpoint</h4>
+<h4><a name="Step_2_-_Create_Checkpoint"></a><a name="out-of-band-create-checkpoint"></a>Step 2 - Create Checkpoint</h4>
<p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with lifetime of 1 month. «TBD»</p></div>
<div class="section">
-<h4><a name="Step_3_-_Perform_Reindex"></a>Step 3 - Perform Reindex</h4>
+<h4><a name="Step_3_-_Perform_Reindex"></a><a name="out-of-band-perform-reindex"></a> Step 3 - Perform Reindex</h4>
<p>In this step we perform the actual indexing via oak-run where it connects to repository in read only mode. </p>
<div class="source">
@@ -335,10 +391,10 @@
<li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated, when indexing in read only mode. For testing purpose, it can be set to ‘head’ to indicate that the head state should be used.</li>
</ul></div>
<div class="section">
-<h4><a name="Step_4_-_Import_the_index"></a>Step 4 - Import the index</h4>
+<h4><a name="Step_4_-_Import_the_index"></a><a name="out-of-band-import-reindex"></a>Step 4 - Import the index</h4>
<p>As a last step we need to import the index back in the repository. This can be done in one of the following ways</p>
<div class="section">
-<h5><a name="a4.1_-_Via_oak-run"></a>4.1 - Via oak-run</h5>
+<h5><a name="a4.1_-_Via_oak-run"></a><a name="import-index-oak-run"></a>4.1 - Via oak-run</h5>
<p>In this mode we import the index using oak-run</p>
<div class="source">
@@ -347,20 +403,20 @@
<p>Here “index dir” is the directory which contains the index files created in step #3. Check the logs from previous command for the directory path.</p>
<p>This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in read-write mode.</p></div>
<div class="section">
-<h5><a name="a4.2_-_Via_IndexerMBean"></a>4.2 - Via IndexerMBean</h5>
+<h5><a name="a4.2_-_Via_IndexerMBean"></a><a name="import-index-mbean"></a>4.2 - Via IndexerMBean</h5>
<p>In this mode we import the index using JMX. Looks for <tt>IndexerMBean</tt> and then import the index directory using the <tt>importIndex</tt> operation</p></div>
<div class="section">
-<h5><a name="a4.3_-_Via_script"></a>4.3 - Via script</h5>
+<h5><a name="a4.3_-_Via_script"></a><a name="import-index-script"></a>4.3 - Via script</h5>
<p>TODO - Provide a way to import the data on older setup using some script</p></div></div></div>
<div class="section">
-<h3><a name="B_-_Online_indexing"></a>B - Online indexing</h3>
+<h3><a name="B_-_Online_indexing"></a><a name="online-indexing"></a>B - Online indexing</h3>
<p>Online indexing automates some of the manual steps which are required for out-of-band indexing. </p>
<p>This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in read-write mode.</p>
<div class="section">
-<h4><a name="Step_1_-_Text_PreExtraction"></a>Step 1 - Text PreExtraction</h4>
+<h4><a name="Step_1_-_Text_PreExtraction"></a><a name="online-indexing-pre-extract"></a>Step 1 - Text PreExtraction</h4>
<p>This is same as in out-of-band indexing</p></div>
<div class="section">
-<h4><a name="Step_2_-_Perform_reindexing"></a>Step 2 - Perform reindexing</h4>
+<h4><a name="Step_2_-_Perform_reindexing"></a><a name="online-indexing-perform-reindex"></a>Step 2 - Perform reindexing</h4>
<p>In this step we configure oak-run to connect to repository in read-write mode and let it perform all other steps i.e checkpoint creation, indexing and import</p>
<div class="source">