You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by mr...@apache.org on 2018/07/09 08:53:19 UTC
svn commit: r1835390 [13/23] - in /jackrabbit/site/live/oak/docs: ./ architecture/ coldstandby/ features/ nodestore/ nodestore/document/ nodestore/segment/ oak-mongo-js/ oak_api/ plugins/ query/ security/ security/accesscontrol/ security/authentication...

Modified: jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/oak-run-indexing.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/oak-run-indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/oak-run-indexing.html Mon Jul  9 08:53:17 2018
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Jackrabbit Oak &#x2013; <a name="oak-run-indexing"></a> Oak Run Indexing</title>
+    <title>Jackrabbit Oak &#x2013;  Oak Run Indexing</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,66 +240,51 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1><a name="oak-run-indexing"></a> Oak Run Indexing</h1>
-
+  -->
+<h1><a name="oak-run-indexing"></a> Oak Run Indexing</h1>
 <ul>
-  
+
 <li><a href="#oak-run-indexing">Oak Run Indexing</a>
-  
 <ul>
-    
+
 <li><a href="#common-options">Common Options</a></li>
-    
 <li><a href="#index-info">Generate Index Info</a></li>
-    
 <li><a href="#dump-index-defn">Dump Index Definitions</a></li>
-    
 <li><a href="#async-index-data">Dump Index Data</a></li>
-    
 <li><a href="#check-index">Index Consistency Check</a></li>
-    
 <li><a href="#reindex">Reindex</a>
-    
 <ul>
-      
+
 <li><a href="#out-of-band-indexing">A - out-of-band indexing</a>
-      
 <ul>
-        
+
 <li><a href="#out-of-band-pre-extraction">Step 1 - Text PreExtraction</a></li>
-        
 <li><a href="#out-of-band-create-checkpoint">Step 2 - Create Checkpoint</a></li>
-        
 <li><a href="#out-of-band-perform-reindex">Step 3 - Perform Reindex</a></li>
-        
 <li><a href="#out-of-band-import-reindex">Step 4 - Import the index</a>
-        
 <ul>
-          
+
 <li><a href="#import-index-oak-run">4.1 - Via oak-run</a></li>
-          
 <li><a href="#import-index-mbean">4.2 - Via IndexerMBean</a></li>
-          
 <li><a href="#import-index-script">4.3 - Via script</a></li>
-        </ul></li>
-      </ul></li>
-      
+</ul>
+</li>
+</ul>
+</li>
 <li><a href="#online-indexing">B - Online indexing</a>
-      
 <ul>
-        
+
 <li><a href="#online-indexing-pre-extract">Step 1 - Text PreExtraction</a></li>
-        
 <li><a href="#online-indexing-perform-reindex">Step 2 - Perform reindexing</a></li>
-      </ul></li>
-      
+</ul>
+</li>
 <li><a href="#index-definition-updates">Updating or Adding New Index Definitions</a></li>
-      
 <li><a href="#json-file-format">JSON File Format</a></li>
-      
 <li><a href="#tika-setup">Tika Setup</a></li>
-    </ul></li>
-  </ul></li>
+</ul>
+</li>
+</ul>
+</li>
 </ul>
 <p><tt>@since Oak 1.7.0</tt></p>
 <p><b>Work in progress. Not to be used on production setups</b></p>
@@ -310,60 +295,63 @@
 <div class="section">
 <h2><a name="Common_Options"></a><a name="common-options"></a> Common Options</h2>
 <p>All the commands support following common options</p>
-
 <ol style="list-style-type: decimal">
-  
-<li><tt>--index-paths</tt> - Comma separated list of index paths for which the selected operations need to be performed. If  not specified then the operation would be performed against all the indexes.</li>
+
+<li><tt>--index-paths</tt> - Comma separated list of index paths for which the selected operations need to be performed. If not specified then the operation would be performed against all the indexes.</li>
 </ol>
 <p>Also refer to help output via <tt>-h</tt> command for some other options</p></div>
 <div class="section">
 <h2><a name="Generate_Index_Info"></a><a name="index-info"></a> Generate Index Info</h2>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-info 
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-info 
 </pre></div></div>
+
 <p>Generates a report consisting of various stats related to indexes present in the given repository. The generated report is stored by default in <tt>&lt;output dir&gt;/index-info.txt</tt></p>
 <p>Supported for all index types</p></div>
 <div class="section">
 <h2><a name="Dump_Index_Definitions"></a><a name="dump-index-defn"></a> Dump Index Definitions</h2>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-definitions
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-definitions
 </pre></div></div>
+
 <p><tt>--index-definitions</tt> operation dumps the index definition in json format to a file <tt>&lt;output dir&gt;/index-definitions.json</tt>. The json file contains index definitions keyed against the index paths</p>
 <p>Supported for all index types</p></div>
 <div class="section">
 <h2><a name="Dump_Index_Data"></a><a name="async-index-data"></a> Dump Index Data</h2>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-dump
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-dump
 </pre></div></div>
+
 <p><tt>--index-dump</tt> operation dumps the index content in output directory. The output directory would contain one folder for each index. Each folder would have a property file <tt>index-details.txt</tt> which contains <tt>indexPath</tt></p>
 <p>Supported for only Lucene indexes.</p></div>
 <div class="section">
 <h2><a name="Index_Consistency_Check"></a><a name="check-index"></a> Index Consistency Check</h2>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-consistency-check
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --fds-path=/path/to/datastore  /path/to/segmentstore/ --index-consistency-check
 </pre></div></div>
-<p><tt>--index-consistency-check</tt> operation performs index consistency check against various indexes. It supports 2 level</p>
 
+<p><tt>--index-consistency-check</tt> operation performs index consistency check against various indexes. It supports 2 level</p>
 <ul>
-  
-<li>Level 1 - Specified as <tt>--index-consistency-check=1</tt>. Performs a basic check to determine if all blobs referred in index  are valid</li>
-  
-<li>Level 2 - Specified as <tt>--index-consistency-check=2</tt>. Performs a more through check to determine if all index files  are valid and no corruption has happened. This check is slower</li>
+
+<li>Level 1 - Specified as <tt>--index-consistency-check=1</tt>. Performs a basic check to determine if all blobs referred in index are valid</li>
+<li>Level 2 - Specified as <tt>--index-consistency-check=2</tt>. Performs a more through check to determine if all index files are valid and no corruption has happened. This check is slower</li>
 </ul>
 <p>It would generate a report in <tt>&lt;output dir&gt;/index-consistency-check-report.txt</tt></p>
 <p>Supported for only Lucene indexes.</p></div>
 <div class="section">
 <h2><a name="Reindex"></a><a name="reindex"></a> Reindex</h2>
 <p>The reindex operation supports 2 modes of index</p>
-
 <ul>
-  
+
 <li>Out-of-band indexing - Here oak-run would connect to repository in read only mode. It would require certain manual steps</li>
-  
 <li>Online Indexing - Here oak-run would connect to repository in <tt>--read-write</tt> mode</li>
 </ul>
 <p>Supported for only Lucene indexes.</p>
@@ -371,43 +359,37 @@
 <div class="section">
 <h3><a name="A_-_out-of-band_indexing"></a><a name="out-of-band-indexing"></a> A - out-of-band indexing</h3>
 <p>Out of band indexing has following phases</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>Get checkpoint issued</li>
-  
 <li>Perform indexing with read only connection to NodeStore upto checkpoint state</li>
-  
 <li>Import the generated indexes</li>
-  
 <li>Complete the increment indexing from checkpoint state to current head</li>
 </ol>
 <div class="section">
 <h4><a name="Step_1_-_Text_PreExtraction"></a><a name="out-of-band-pre-extraction"></a> Step 1 - Text PreExtraction</h4>
-<p>If the index being reindexed involves fulltext index and the repository has binary content then its recommended that first <a href="pre-extract-text.html">text pre-extraction</a> is performed. This ensures that costly operation around text extraction is done prior to actual indexing so that actual indexing does not do text extraction in critical path</p></div>
+<p>If the index being reindexed involves fulltext index and the repository has binary content then its recommended that first  <a href="pre-extract-text.html">text pre-extraction</a> is performed. This ensures that costly operation around text extraction is done prior to actual indexing so that actual indexing does not do text extraction in critical path</p></div>
 <div class="section">
 <h4><a name="Step_2_-_Create_Checkpoint"></a><a name="out-of-band-create-checkpoint"></a>Step 2 - Create Checkpoint</h4>
-<p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with a long enough lifetime like 10 days. For this invoke  <tt>CheckpointMBean#createCheckpoint</tt> with 864000000 as argument for lifetime</p></div>
+<p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with a long enough lifetime like 10 days. For this invoke <tt>CheckpointMBean#createCheckpoint</tt> with 864000000 as argument for lifetime</p></div>
 <div class="section">
 <h4><a name="Step_3_-_Perform_Reindex"></a><a name="out-of-band-perform-reindex"></a> Step 3 - Perform Reindex</h4>
-<p>In this step we perform the actual indexing via oak-run where it connects to repository in read only mode. </p>
+<p>In this step we perform the actual indexing via oak-run where it connects to repository in read only mode.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint"> java -jar oak-run*.jar index --reindex \
+<div>
+<div>
+<pre class="source"> java -jar oak-run*.jar index --reindex \
  --index-paths=/oak:index/indexName \
- --checkpoint=0fd2a388-de87-47d3-8f30-e86b1cf0a081 \    
+ --checkpoint=0fd2a388-de87-47d3-8f30-e86b1cf0a081 \	
  --fds-path=/path/to/datastore  /path/to/segmentstore/ 
 </pre></div></div>
-<p>Here following options can be used</p>
 
+<p>Here following options can be used</p>
 <ul>
-  
+
 <li><tt>--pre-extracted-text-dir</tt> - Directory path containing pre extracted text generated via step #1 (optional)</li>
-  
 <li><tt>--index-paths</tt> - This command requires an explicit set of index paths which need to be indexed (required)</li>
-  
-<li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated, when indexing in read only mode. For  testing purpose, it can be set to &#x2018;head&#x2019; to indicate that the head state should be used. (required)</li>
-  
+<li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated, when indexing in read only mode. For testing purpose, it can be set to &#x2018;head&#x2019; to indicate that the head state should be used. (required)</li>
 <li><tt>--index-definitions-file</tt> - json file file path which contains updated index definitions</li>
 </ul>
 <p>If the index does not support fulltext indexing then you can omit providing BlobStore details</p></div>
@@ -418,11 +400,13 @@
 <h5><a name="a4.1_-_Via_oak-run"></a><a name="import-index-oak-run"></a>4.1 - Via oak-run</h5>
 <p>In this mode we import the index using oak-run</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --index-import --read-write \
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --index-import --read-write \
     --index-import-dir=&lt;index dir&gt;  \
     --fds-path=/path/to/datastore /path/to/segmentstore
 </pre></div></div>
+
 <p>Here &#x201c;index dir&#x201d; is the directory which contains the index files created in step #3. Check the logs from previous command for the directory path.</p>
 <p>This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in read-write mode.</p></div>
 <div class="section">
@@ -433,7 +417,7 @@
 <p>TODO - Provide a way to import the data on older setup using some script</p></div></div></div>
 <div class="section">
 <h3><a name="B_-_Online_indexing"></a><a name="online-indexing"></a>B - Online indexing</h3>
-<p>Online indexing automates some of the manual steps which are required for out-of-band indexing. </p>
+<p>Online indexing automates some of the manual steps which are required for out-of-band indexing.</p>
 <p>This mode should only be used when repository is from Oak version 1.7+ as oak-run connects to the repository in read-write mode.</p>
 <div class="section">
 <h4><a name="Step_1_-_Text_PreExtraction"></a><a name="online-indexing-pre-extract"></a>Step 1 - Text PreExtraction</h4>
@@ -442,23 +426,28 @@
 <h4><a name="Step_2_-_Perform_reindexing"></a><a name="online-indexing-perform-reindex"></a>Step 2 - Perform reindexing</h4>
 <p>In this step we configure oak-run to connect to repository in read-write mode and let it perform all other steps i.e checkpoint creation, indexing and import</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --reindex --index-paths=/oak:index/lucene --read-write --fds-path=/path/to/datastore /path/to/segmentstore
-</pre></div></div></div></div>
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --reindex --index-paths=/oak:index/lucene --read-write --fds-path=/path/to/datastore /path/to/segmentstore
+</pre></div></div>
+</div></div>
 <div class="section">
 <h3><a name="Updating_or_Adding_New_Index_Definitions"></a><a name="index-definition-updates"></a> Updating or Adding New Index Definitions</h3>
 <p><tt>@since Oak 1.7.5</tt></p>
 <p>Index tooling support updating and adding new index definitions to existing setups. This can be done by passing in path of a json file which contains index definitions</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --reindex --index-paths=/oak:index/newAssetIndex \
+<div>
+<div>
+<pre class="source">java -jar oak-run*.jar index --reindex --index-paths=/oak:index/newAssetIndex \
 --index-definitions-file=index-definitions.json \
 --fds-path=/path/to/datastore /path/to/segmentstore  
 </pre></div></div>
+
 <p>Where index-definitions.json has following structure</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   &quot;/oak:index/newAssetIndex&quot;: {
     &quot;evaluatePathRestrictions&quot;: true,
     &quot;compatVersion&quot;: 2,
@@ -488,17 +477,18 @@
   }
 }
 </pre></div></div>
-<p>Some points to note about this json file * Each key of top level object refers to the index path * The value of each such key refers to complete index definition * If the index path is not present in existing repository then it would result in a new index being created * In case of new index it must be ensured that parent path structure must already exist in repository.  So if a new index is being created at <tt>/content/en/oak:index/contentIndex</tt> then path upto <tt>/content/en/oak:index</tt>  should already exist in repository * If this option is used with online indexing then do ensure that oak-run version matches with the Oak version  used by target repository</p>
+
+<p>Some points to note about this json file * Each key of top level object refers to the index path * The value of each such key refers to complete index definition * If the index path is not present in existing repository then it would result in a new index being created * In case of new index it must be ensured that parent path structure must already exist in repository. So if a new index is being created at <tt>/content/en/oak:index/contentIndex</tt> then path upto  <tt>/content/en/oak:index</tt> should already exist in repository * If this option is used with online indexing then do ensure that oak-run version matches with the Oak version used by target repository</p>
 <p>You can also use the json file generated from <a class="externalLink" href="http://oakutils.appspot.com/generate/index">Oakutils</a>. It needs to be modified to confirm to above structure i.e. enclose the whole definition under the intended index path key.</p>
 <p>In general the index definitions does not need any special encoding of values as Index definitions in Oak use only String, Long and Double types mostly. However if the index refers to binary config like Tika config then the binary data would need to encoded. Refer to next section for more details.</p>
-<p>This option is supported in both online and out-of-band indexing. </p>
+<p>This option is supported in both online and out-of-band indexing.</p>
 <p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-6471">OAK-6471</a></p></div>
 <div class="section">
 <h3><a name="JSON_File_Format"></a><a name="json-file-format"></a> JSON File Format</h3>
 <p>Some of the standard types used in Oak are not supported directly by JSON like names, blobs etc. Those would need to be encoded in a specific format.</p>
 <p>Below are the encoding rules</p>
-
 <dl>
+
 <dt>LONG</dt>
 <dd>No encoding required</dd>
 <dd><i>&#x201c;compatVersion&#x201d;: 2</i></dd>
@@ -517,7 +507,7 @@
 <dd><i>&#x201c;created&#x201d;: &#x201c;dat:2017-07-20T13:23:21.196+05:30&#x201d;</i></dd>
 <dt>NAME</dt>
 <dd>Prefix the value with <tt>nam:</tt>.</dd>
-<dd>For <tt>jcr:primaryType</tt> and <tt>jcr:mixins</tt> no encoding is required. Any property with these names would be converted to  NAME type</dd>
+<dd>For <tt>jcr:primaryType</tt> and <tt>jcr:mixins</tt> no encoding is required. Any property with these names would be converted to NAME type</dd>
 <dd><i>&#x201c;nodetype&#x201d;: &#x201c;nam:nt:base&#x201d;</i></dd>
 <dt>PATH</dt>
 <dd>Prefix the value with <tt>pat:</tt></dd>
@@ -526,7 +516,7 @@
 <dd>Prefix the value with <tt>uri:</tt></dd>
 <dd><i>&#x201c;serverURI&#x201d;: &#x201c;uri:http://foo.example.com&#x201d;</i></dd>
 <dt>BINARY</dt>
-<dd>By default the binary values are encoded as Base64 string if the binary is less than 1 MB size. The encoded value is  prefixed with <tt>:blobId:</tt></dd>
+<dd>By default the binary values are encoded as Base64 string if the binary is less than 1 MB size. The encoded value is prefixed with <tt>:blobId:</tt></dd>
 <dd><i>&#x201c;jcr:data&#x201d;: &#x201c;:blobId:axygz&#x201d;</i></dd>
 </dl></div>
 <div class="section">
@@ -535,8 +525,9 @@
 <p>First download the <a class="externalLink" href="https://tika.apache.org/download.html">tika-app</a> jar from Tika downloads. You should be able to use 1.15 version with Oak 1.7.4 jar.</p>
 <p>Then modify the index command like below. The rest of arguments remain same as documented before.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">java -cp oak-run.jar:tika-app-1.15.jar org.apache.jackrabbit.oak.run.Main index
+<div>
+<div>
+<pre class="source">java -cp oak-run.jar:tika-app-1.15.jar org.apache.jackrabbit.oak.run.Main index
 </pre></div></div></div></div>
         </div>
       </div>

Modified: jackrabbit/site/live/oak/docs/query/ootb-index-change.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/ootb-index-change.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/ootb-index-change.html (original)
+++ jackrabbit/site/live/oak/docs/query/ootb-index-change.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Changing Out-Of-The-Box Index Definitions</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,14 +240,16 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><div class="section">
+  -->
+<div class="section">
 <h2><a name="Changing_Out-Of-The-Box_Index_Definitions"></a>Changing Out-Of-The-Box Index Definitions</h2>
 <p>You may have the need to change an out-of-the-box index definition that is shipped either with oak or any other products built on top of it.</p>
 <p>To better deal with upgrades and changes in provided index definitions it would be better to follow the following practice.</p>
 <p>Let&#x2019;s say for example that you have the following index definition as <tt>NodeTypeIndex</tt> and you&#x2019;d like to add your custom node to the list: <tt>cust:unstructured</tt>.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">&quot;oak:index/nodetype&quot; : {
+<div>
+<div>
+<pre class="source">&quot;oak:index/nodetype&quot; : {
   &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
   &quot;declaringNodeTypes&quot;: [
     &quot;sling:MessageEntry&quot;,
@@ -270,24 +272,21 @@
   &quot;reindexCount&quot;: 1
 }
 </pre></div></div>
-<p>to customise it you would do the following:</p>
 
+<p>to customise it you would do the following:</p>
 <ol style="list-style-type: decimal">
-  
-<li>Copy the current index definition with a new name. Let&#x2019;s say  <tt>oak:index/custNodeType</tt></li>
-  
+
+<li>Copy the current index definition with a new name. Let&#x2019;s say <tt>oak:index/custNodeType</tt></li>
 <li>Add the custom nodetype to the <tt>declaringNodeTypes</tt></li>
-  
 <li>Issue a re-index by setting <tt>reindex=true</tt></li>
-  
 <li>wait for it to finish</li>
-  
-<li>either  <a href="./query-engine.html#Temporarily_Disabling_an_Index">disable</a> the  old index definition or delete it.</li>
+<li>either <a href="./query-engine.html#Temporarily_Disabling_an_Index">disable</a> the old index definition or delete it.</li>
 </ol>
 <p>The new index definition in our example, once completed would look like the following:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">&quot;oak:index/custNodetype&quot; : {
+<div>
+<div>
+<pre class="source">&quot;oak:index/custNodetype&quot; : {
   &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
   &quot;declaringNodeTypes&quot;: [
     &quot;sling:MessageEntry&quot;,

Modified: jackrabbit/site/live/oak/docs/query/ordered-index-migrate.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/ordered-index-migrate.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/ordered-index-migrate.html (original)
+++ jackrabbit/site/live/oak/docs/query/ordered-index-migrate.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Migrating Ordered Index to Lucene Property</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,12 +240,14 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1>Migrating Ordered Index to Lucene Property</h1>
+  -->
+<h1>Migrating Ordered Index to Lucene Property</h1>
 <p>A quick step-by-step on how to migrate from the ordered index to lucene.</p>
 <p>Assuming you have the following ordered index configuration</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
     ...
     &quot;declaringNodeTypes&quot; : &quot;nt:unstructured&quot;,
     &quot;direction&quot; : &quot;ascending&quot;,
@@ -254,10 +256,12 @@
     ...
 }
 </pre></div></div>
+
 <p>the related lucene configuration will be</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
     &quot;jcr:primaryType&quot; : &quot;oak:QueryIndexDefinition&quot;,
     &quot;compatVersion&quot; : 2,
     &quot;type&quot; : &quot;lucene&quot;,
@@ -277,6 +281,7 @@
     }
 }
 </pre></div></div>
+
 <p>for all the details around the configuration of Lucene index and additional flags, please refer to the <a href="lucene.html">index documetation</a>.</p>
         </div>
       </div>

Modified: jackrabbit/site/live/oak/docs/query/ordered-index.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/ordered-index.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/ordered-index.html (original)
+++ jackrabbit/site/live/oak/docs/query/ordered-index.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Ordered Index (deprecated since 1.1.8)</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,12 +240,14 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1>Ordered Index (deprecated since 1.1.8)</h1>
+  -->
+<h1>Ordered Index (deprecated since 1.1.8)</h1>
 <p>Extension of the Property index will keep the order of the indexed property persisted in the repository.</p>
 <p>Used to speed-up queries with <tt>ORDER BY</tt> clause, <i>equality</i> and <i>range</i> ones.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">SELECT * FROM [nt:base] ORDER BY jcr:lastModified
+<div>
+<div>
+<pre class="source">SELECT * FROM [nt:base] ORDER BY jcr:lastModified
 
 SELECT * FROM [nt:base] WHERE jcr:lastModified &gt; $date
 
@@ -256,31 +258,25 @@ WHERE jcr:lastModified &gt; $date1 AND j
 
 SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id
 </pre></div></div>
-<p>To define a property index on a subtree you have to add an index definition node that:</p>
 
+<p>To define a property index on a subtree you have to add an index definition node that:</p>
 <ul>
-  
+
 <li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
-  
 <li>must have the <tt>type</tt> property set to <b><tt>ordered</tt></b></li>
-  
-<li>contains the <tt>propertyNames</tt> property that indicates what properties  will be stored in the index. <tt>propertyNames</tt> has to be a single  value list of type <tt>Name[]</tt></li>
+<li>contains the <tt>propertyNames</tt> property that indicates what properties will be stored in the index.  <tt>propertyNames</tt> has to be a single value list of type <tt>Name[]</tt></li>
 </ul>
 <p><i>Optionally</i> you can specify</p>
-
 <ul>
-  
-<li>the <tt>reindex</tt> flag which when set to <tt>true</tt>, triggers a full content  re-index.</li>
-  
-<li>The direction of the sorting by specifying a <tt>direction</tt> property of  type <tt>String</tt> of value <tt>ascending</tt> or <tt>descending</tt>. If not provided  <tt>ascending</tt> is the default.</li>
-  
-<li>The index can be defined as asynchronous by providing the property  <tt>async=async</tt></li>
+
+<li>the <tt>reindex</tt> flag which when set to <tt>true</tt>, triggers a full content re-index.</li>
+<li>The direction of the sorting by specifying a <tt>direction</tt> property of type <tt>String</tt> of value <tt>ascending</tt> or <tt>descending</tt>. If not provided <tt>ascending</tt> is the default.</li>
+<li>The index can be defined as asynchronous by providing the property <tt>async=async</tt></li>
 </ul>
 <p><i>Caveats</i></p>
-
 <ul>
-  
-<li>In case deploying on the index on a clustered mongodb you have to  define it as asynchronous by providing <tt>async=async</tt> in the index  definition. This is to avoid cluster merges.</li>
+
+<li>In case deploying on the index on a clustered mongodb you have to define it as asynchronous by providing <tt>async=async</tt> in the index definition. This is to avoid cluster merges.</li>
 </ul>
         </div>
       </div>

Modified: jackrabbit/site/live/oak/docs/query/pre-extract-text.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/pre-extract-text.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/pre-extract-text.html (original)
+++ jackrabbit/site/live/oak/docs/query/pre-extract-text.html Mon Jul  9 08:53:17 2018
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-25 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180525" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Jackrabbit Oak &#x2013; <a name="pre-extract-text"></a>Pre-Extracting Text from Binaries</title>
+    <title>Jackrabbit Oak &#x2013; Pre-Extracting Text from Binaries</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-25<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,61 +240,53 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1><a name="pre-extract-text"></a>Pre-Extracting Text from Binaries</h1>
-
+  -->
+<h1><a name="pre-extract-text"></a>Pre-Extracting Text from Binaries</h1>
 <ul>
-  
+
 <li><a href="#pre-extract-text">Pre-Extracting Text from Binaries</a>
-  
 <ul>
-    
+
 <li><a href="#a-oak-run-command">A - Oak Run Pre-Extraction Command</a>
-    
 <ul>
-      
+
 <li><a href="#a-setup">Step 1 - oak-run Setup</a></li>
-      
 <li><a href="#a-generate-csv">Step 2 - Generate the csv file</a></li>
-      
 <li><a href="#a-perform-text-extraction">Step 3 - Perform the text extraction</a>
-      
 <ul>
-        
+
 <li><a href="#a-tika-text-extraction">1. using tika</a></li>
-        
 <li><a href="#a-index-text-extraction">2. using dumped indexed data</a></li>
-      </ul></li>
-    </ul></li>
-    
+</ul>
+</li>
+</ul>
+</li>
 <li><a href="#b-pre-extracted-text-provider">B - PreExtractedTextProvider</a>
-    
 <ul>
-      
+
 <li><a href="#b-oak-app">Oak application</a></li>
-      
 <li><a href="#b-oak-run">Oak Run Indexing</a></li>
-    </ul></li>
-  </ul></li>
+</ul>
+</li>
+</ul>
+</li>
 </ul>
 <p><tt>@since Oak 1.0.18, 1.2.3</tt></p>
-<p>Lucene indexing is performed in a single threaded mode. Extracting text from binaries is an expensive operation and slows down the indexing rate considerably. For incremental indexing this mostly works fine but if performing a reindex or creating the index for the first time after migration then it increases the indexing time considerably. To speed up such cases Oak supports pre extracting text from binaries to avoid extracting text at indexing time. This feature consist of 2 broad steps </p>
-
+<p>Lucene indexing is performed in a single threaded mode. Extracting text from binaries is an expensive operation and slows down the indexing rate considerably. For incremental indexing this mostly works fine but if performing a reindex or creating the index for the first time after migration then it increases the indexing time considerably. To speed up such cases Oak supports pre extracting text from binaries to avoid extracting text at indexing time. This feature consist of 2 broad steps</p>
 <ol style="list-style-type: decimal">
-  
+
 <li>Extract and store the extracted text from binaries using oak-run tooling.</li>
-  
 <li>Configure Oak runtime to use the extracted text at time of indexing via <tt>PreExtractedTextProvider</tt></li>
 </ol>
 <p>For more details on this feature refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2892">OAK-2892</a></p>
 <div class="section">
 <h2><a name="A_-_Oak_Run_Pre-Extraction_Command"></a><a name="a-oak-run-command"></a>A - Oak Run Pre-Extraction Command</h2>
-<p>Oak run tool provides a <tt>tika</tt> command which supports traversing the repository and then extracting text from the binary properties. </p>
+<p>Oak run tool provides a <tt>tika</tt> command which supports traversing the repository and then extracting text from the binary properties.</p>
 <div class="section">
 <h3><a name="Step_1_-_oak-run_Setup"></a><a name="a-setup"></a>Step 1 - oak-run Setup</h3>
 <p>Download following jars</p>
-
 <ul>
-  
+
 <li>oak-run 1.7.4 <a class="externalLink" href="https://repo1.maven.org/maven2/org/apache/jackrabbit/oak-run/1.7.4/oak-run-1.7.4.jar">link</a></li>
 </ul>
 <p>Refer to <a href="../features/oak-run-nodestore-connection-options.html">oak-run setup</a> for details about connecting to different types of NodeStore. Example below assume a setup consisting of SegmentNodeStore and FileDataStore. Depending on setup use the appropriate connection options.</p>
@@ -303,47 +295,49 @@
 <p>Of the following steps #2 i.e. generation of csv file scans the whole repository. Hence this step should be run when system is not in active use. Step #3 only requires access to BlobStore and hence can be run while Oak application is in use.</p></div>
 <div class="section">
 <h3><a name="Step_2_-_Generate_the_csv_file"></a><a name="a-generate-csv"></a>Step 2 - Generate the csv file</h3>
-<p>As the first step you would need to generate a csv file which would contain details about the binary property. This file would be generated by using the <tt>tika</tt> command from oak-run. In this step oak-run would connect to repository in read only mode. </p>
+<p>As the first step you would need to generate a csv file which would contain details about the binary property. This file would be generated by using the <tt>tika</tt> command from oak-run. In this step oak-run would connect to repository in read only mode.</p>
 <p>To generate the csv file use the <tt>--generate</tt> action</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    java -jar oak-run.jar tika \
+<div>
+<div>
+<pre class="source">    java -jar oak-run.jar tika \
     --fds-path /path/to/datastore \
     /path/to/segmentstore --data-file oak-binary-stats.csv --generate
 </pre></div></div>
+
 <p>If connecting to S3 this command can take long time because checking binary id currently triggers download of the actual binary content which we do not require. To speed up here we can use the Fake DataStore support of oak-run</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    java -jar oak-run.jar tika \
+<div>
+<div>
+<pre class="source">    java -jar oak-run.jar tika \
     --fake-ds-path=temp \
     /path/to/segmentstore --data-file oak-binary-stats.csv --generate
 </pre></div></div>
+
 <p>This would generate a csv file with content like below</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">43844ed22d640a114134e5a25550244e8836c00c#28705,28705,&quot;application/octet-stream&quot;,,&quot;/content/activities/jcr:content/folderThumbnail/jcr:content&quot;
+<div>
+<div>
+<pre class="source">43844ed22d640a114134e5a25550244e8836c00c#28705,28705,&quot;application/octet-stream&quot;,,&quot;/content/activities/jcr:content/folderThumbnail/jcr:content&quot;
 43844ed22d640a114134e5a25550244e8836c00c#28705,28705,&quot;application/octet-stream&quot;,,&quot;/content/snowboarding/jcr:content/folderThumbnail/jcr:content&quot;
 ...
 </pre></div></div>
+
 <p>By default it scans whole repository. If you need to restrict it to look up under certain path then specify the path via <tt>--path</tt> option.</p></div>
 <div class="section">
 <h3><a name="Step_3_-_Perform_the_text_extraction"></a><a name="a-perform-text-extraction"></a>Step 3 - Perform the text extraction</h3>
 <p>Once the csv file is generated we need to perform the text extraction.</p>
 <p>Currently extracted text files are stored as files per blob in a format which is same one used with <tt>FileDataStore</tt> In addition to that it creates 2 files</p>
-
 <ul>
-  
+
 <li>blobs_error.txt - File containing blobIds for which text extraction ended in error</li>
-  
 <li>blobs_empty.txt - File containing blobIds for which no text was extracted</li>
 </ul>
 <p>This phase is incremental i.e. if run multiple times and same <tt>--store-path</tt> is specified then it would avoid extracting text from previously processed binaries.</p>
 <p>There are 2 ways of doing this:</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>Do text extraction using tika</li>
-  
 <li>Use a suitable lucene index to get text extraction data from index itself which would have been generated earlier</li>
 </ol>
 <div class="section">
@@ -351,16 +345,18 @@
 <p>To do that we would need to download the <a class="externalLink" href="https://tika.apache.org/download.html">tika-app</a> jar from Tika downloads. You should be able to use 1.15 version with Oak 1.7.4 jar.</p>
 <p>To perform the text extraction use the <tt>--extract</tt> action</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    java -cp oak-run.jar:tika-app-1.15.jar \
+<div>
+<div>
+<pre class="source">    java -cp oak-run.jar:tika-app-1.15.jar \
     org.apache.jackrabbit.oak.run.Main tika \
     --data-file binary-stats.csv \
     --store-path ./store  \
     --fds-path /path/to/datastore  extract
 </pre></div></div>
-<p>This command does not require access to NodeStore and only requires access to the BlobStore. So configure the BlobStore which is in use like FileDataStore or S3DataStore. Above command would do text extraction using multiple threads and store the extracted text in directory specified by <tt>--store-path</tt>. </p>
+
+<p>This command does not require access to NodeStore and only requires access to the BlobStore. So configure the BlobStore which is in use like FileDataStore or S3DataStore. Above command would do text extraction using multiple threads and store the extracted text in directory specified by <tt>--store-path</tt>.</p>
 <p>Consequently, this can be run from a different machine (possibly more powerful to allow use of multiple cores) to speed up text extraction. One can also split the csv into multiple chunks and process them on different machines and then merge the stores later. Just ensure that at merge time blobs*.txt files are also merged</p>
-<p>Note that we need to launch the command with <tt>-cp</tt> instead of <tt>-jar</tt> as we need to include classes outside of oak-run jar like tika-app. Also ensure that oak-run comes before in classpath. This is required due to some old classes being packaged in tika-app </p></div>
+<p>Note that we need to launch the command with <tt>-cp</tt> instead of <tt>-jar</tt> as we need to include classes outside of oak-run jar like tika-app. Also ensure that oak-run comes before in classpath. This is required due to some old classes being packaged in tika-app</p></div>
 <div class="section">
 <h4><a name="a3.2_-_Populate_text_extraction_store_using_already_indexed_data"></a><a name="a-index-text-extraction"></a> 3.2 - Populate text extraction store using already indexed data</h4>
 <p><tt>@since Oak 1.9.3</tt></p>
@@ -372,22 +368,21 @@
 <div class="section">
 <h5><a name="Suitability_of_index_used_for_populating_extracted_text_store"></a>Suitability of index used for populating extracted text store</h5>
 <p>Indexes which index binaries are obvious candidates to be consumed in this way. But there are few more constraints that the definition needs to adhere to:</p>
-
 <ul>
-  
+
 <li>it should index binary on the same path where binary exists (binary must not be on a relative path)</li>
-  
 <li>it should not index multiple binaries on the indexed path
-  
 <ul>
-    
+
 <li>IOW, multiple non-relative property definitions don&#x2019;t match and index binaries</li>
-  </ul></li>
+</ul>
+</li>
 </ul>
 <p>Example of usable index definitions</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    + /oak:index/usableIndex1
+<div>
+<div>
+<pre class="source">    + /oak:index/usableIndex1
       ...
       + indexRules
         ...
@@ -410,10 +405,12 @@
               - isRegexp=true
               - nodeScopeIndex=true
 </pre></div></div>
+
 <p>Examples of unusable index definitions</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    + /oak:index/unUsableIndex1
+<div>
+<div>
+<pre class="source">    + /oak:index/unUsableIndex1
       ...
       + indexRules
         ...
@@ -433,16 +430,19 @@
           + include0
             - path=&quot;jcr:content&quot;
 </pre></div></div>
+
 <p>With those pre-requisites mentioned, let&#x2019;s dive into how to use this.</p>
 <p>We&#x2019;d first need to dump index data from a suitable index (say <tt>/oak:index/suitableIndexDef</tt>) using <a href="oak-run-indexing.html#async-index-data">dump index</a> method at say <tt>/path/to/index/dump</tt></p>
 <p>Then use <tt>--populate</tt> action to populate extracted text store using a dump of usable indexed data. The command would look something like:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    java -jar oak-run.jar tika \
+<div>
+<div>
+<pre class="source">    java -jar oak-run.jar tika \
     --data-file binary-stats.csv \
     --store-path ./store  \
     --index-dir /path/to/index/dump/index-dumps/suitableIndexDef/data populate
 </pre></div></div>
+
 <p>This command doesn&#x2019;t need to connect to either node store or blob store, so we don&#x2019;t need to configure it in the execution.</p>
 <p>This command would update <tt>blobs_empty.txt</tt> if indexed data for a given path is empty.</p>
 <p>It would also update <tt>blobs_error.txt</tt> if indexed data for a given path has indexed special value <tt>TextExtractionError</tt>.</p>
@@ -454,7 +454,7 @@
 <h3><a name="Oak_application"></a><a name="b-oak-app"></a>Oak application</h3>
 <p><tt>@since Oak 1.0.18, 1.2.3</tt></p>
 <p>For this look for OSGi config for <tt>Apache Jackrabbit Oak DataStore PreExtractedTextProvider</tt></p>
-<p><img src="pre-extracted-text-osgi.png" alt="OSGi Configuration" /> </p>
+<p><img src="pre-extracted-text-osgi.png" alt="OSGi Configuration" /></p>
 <p>Once <tt>PreExtractedTextProvider</tt> is configured then upon reindexing Lucene indexer would make use of it to check if text needs to be extracted or not. Check <tt>TextExtractionStatsMBean</tt> for various statistics around text extraction and also to validate if <tt>PreExtractedTextProvider</tt> is being used.</p></div>
 <div class="section">
 <h3><a name="Oak_Run_Indexing"></a><a name="b-oak-run"></a>Oak Run Indexing</h3>

Modified: jackrabbit/site/live/oak/docs/query/property-index.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/property-index.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/property-index.html (original)
+++ jackrabbit/site/live/oak/docs/query/property-index.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; The Property Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -241,54 +241,44 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><div class="section">
+  -->
+<div class="section">
 <h2><a name="The_Property_Index"></a>The Property Index</h2>
 <p>Is useful whenever there is a query with a property constraint that is not full-text:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id
+<div>
+<div>
+<pre class="source">SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id
 </pre></div></div>
-<p>To define a property index, you have to add an index definition node that:</p>
 
+<p>To define a property index, you have to add an index definition node that:</p>
 <ul>
-  
+
 <li>Must be a child node of <tt>/oak:index</tt>.</li>
-  
 <li>Must be of type <tt>oak:QueryIndexDefinition</tt>.</li>
-  
-<li><tt>type</tt> (String) must have the property set to &#x201c;property&#x201d;.</li>
-  
-<li><tt>propertyNames</tt> (Name, multi-valued): the property to be indexed. This is a multi-valued property, and must not be empty. It usually contains only <i>one</i> property name. All nodes that have <i>any</i> of those properties are stored in this index.</li>
+<li><tt>type</tt> (String) must have the  property set to &#x201c;property&#x201d;.</li>
+<li><tt>propertyNames</tt> (Name, multi-valued): the  property to be indexed. This is a multi-valued property, and must not be empty. It usually contains only <i>one</i> property name. All nodes that have <i>any</i> of those properties are stored in this index.</li>
 </ul>
 <p>It is recommended to index one property per index. (If multiple properties are indexed within one index, then the index contains all nodes that has either one of the properties, which can make the query less efficient, and can make the query pick the wrong index.)</p>
 <p>Optionally you can specify:</p>
-
 <ul>
-  
+
 <li><tt>declaringNodeTypes</tt> (Name, multi-valued): the index only applies to a certain node type.</li>
-  
-<li><tt>unique</tt> (Boolean): if set to <tt>true</tt>, a uniqueness constraint on this  property is added. Ensure you set declaringNodeTypes,  otherwise all nodes of the repository are affected (which is most likely not what you want),  and you are not able to version the node.</li>
-  
+<li><tt>unique</tt> (Boolean): if set to <tt>true</tt>, a uniqueness constraint on this property is added. Ensure you set declaringNodeTypes, otherwise all nodes of the repository are affected (which is most likely not what you want), and you are not able to version the node.</li>
 <li><tt>includedPaths</tt> (String, multi-valued): the paths that are included (&#x2018;/&#x2019; if not set). Since Oak version 1.4 (OAK-3263). The index is only used if the query has a path restriction that is not excluded, and part of the included paths.</li>
-  
 <li><tt>excludedPaths</tt> (String, multi-valued): the paths where this index is excluded (none if not set). Since Oak version 1.4 (OAK-3263). The index is only used if the query has a path restriction that is not excluded, and part of the included paths.</li>
-  
 <li><tt>valuePattern</tt> (String) A regular expression of all indexed values. The index is used for equality conditions where the value matches the pattern, and for &#x201c;in(&#x2026;)&#x201d; queries where all values match the pattern. The index is not used for &#x201c;like&#x201d; conditions. Since Oak version 1.7.2 (OAK-4637).</li>
-  
 <li><tt>valueExcludedPrefixes</tt> The index is used for equality conditions where the value does not start with the given prefix, and the prefix does not start with the value, similarly for &#x201c;in(&#x2026;)&#x201d; conditions, and similarly for &#x201c;like&#x201d; conditions. and for &#x201c;in(&#x2026;)&#x201d; queries where all values match the pattern. Since Oak version 1.7.2 (OAK-4637).</li>
-  
 <li><tt>valueIncludedPrefixes</tt> The index is used for equality conditions where the value starts with the given prefix, similarly for &#x201c;in(&#x2026;)&#x201d; conditions, and similarly for &#x201c;like&#x201d; conditions. Since Oak version 1.7.2 (OAK-4637).</li>
-  
-<li><tt>entryCount</tt> (Long): the estimated number of path entries in the index,  to override the cost estimation (a high entry count means a high cost).</li>
-  
-<li><tt>keyCount</tt> (Long), the estimated number of keys in the index,  to override the cost estimation (a high key count means a lower cost and  a low key count means a high cost  when searching for specific keys; has no effect when searching for &#x201c;is not null&#x201d;).</li>
-  
-<li><tt>reindex</tt> (Boolean): if set to <tt>true</tt>, the full content is re-indexed.  This can take a long time, and is run synchronously with storing the index  (except with an async index). See &#x201c;Reindexing&#x201d; below for details.</li>
+<li><tt>entryCount</tt> (Long): the estimated number of path entries in the index, to override the cost estimation (a high entry count means a high cost).</li>
+<li><tt>keyCount</tt> (Long), the estimated number of keys in the index, to override the cost estimation (a high key count means a lower cost and a low key count means a high cost when searching for specific keys; has no effect when searching for &#x201c;is not null&#x201d;).</li>
+<li><tt>reindex</tt> (Boolean): if set to <tt>true</tt>, the full content is re-indexed. This can take a long time, and is run synchronously with storing the index (except with an async index). See &#x201c;Reindexing&#x201d; below for details.</li>
 </ul>
 <p>Example:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   NodeBuilder index = root.child(&quot;oak:index&quot;);
   index.child(&quot;uuid&quot;)
     .setProperty(&quot;jcr:primaryType&quot;, &quot;oak:QueryIndexDefinition&quot;, Type.NAME)
@@ -299,14 +289,17 @@
     .setProperty(&quot;reindex&quot;, true);
 }
 </pre></div></div>
+
 <p>or to simplify you can use one of the existing <tt>IndexUtils#createIndexDefinition</tt> helper methods:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   NodeBuilder index = IndexUtils.getOrCreateOakIndex(root);
   IndexUtils.createIndexDefinition(index, &quot;myProp&quot;, true, false, ImmutableList.of(&quot;myProp&quot;), null);
 }
 </pre></div></div>
+
 <div class="section">
 <div class="section">
 <h4><a name="Reindexing"></a><a name="reindexing"></a> Reindexing</h4>
@@ -315,32 +308,28 @@
 <p>Asynchronous reindexing of a property index is available as of OAK-1456. The way this works is by pushing the property index updates to a background job and when the indexing process is done, the property definition will be switched back to a synchronous updates mode. To enable this async reindex behaviour you need to first set the <b><tt>reindex-async</tt></b> and <b><tt>reindex</tt></b> flags to <b><tt>true</tt></b> (call #save). You can verify the initial setup worked by refreshing the index definition node and looking for the <b><tt>async</tt></b> = <b><tt>async-reindex</tt></b> property. Next you need to start the dedicated background job via a jmx call to the <b><tt>PropertyIndexAsyncReindex#startPropertyIndexAsyncReindex</tt></b> MBean.</p>
 <p>Example:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   NodeBuilder index = root.child(&quot;oak:index&quot;);
   index.child(&quot;property&quot;)
     .setProperty(&quot;reindex-async&quot;, true)
     .setProperty(&quot;reindex&quot;, true);
 }
 </pre></div></div>
+
 <p>When recovering a failed async reindex special care needs to be taken wrt. the created checkpoint and the <b><tt>async</tt></b> property. The checkpoint should be released via the <b><tt>CheckpointManager</tt></b> mbean, and the <b><tt>async</tt></b> property needs to be manually deleted while also setting the <b><tt>reindex</tt></b> flags to <b><tt>true</tt></b> to make sure the index returns to a consistent state, in sync with the head revision.</p></div>
 <div class="section">
 <h4><a name="Cost_Estimation"></a>Cost Estimation</h4>
 <p>When running a query, the property index reports its estimated cost to the query engine, and then the query engine picks the index with the lowest cost (cost-based query optimization). The algorithm to calculate the estimated cost is roughly as follows (a bit simplified):</p>
-
 <ul>
-  
-<li>The cost is infinity (so the index is never used)  if the condition contains a fulltext constraint,  no applicable restriction,  the wrong nodetype, or  if the path filtering (<tt>includedPaths</tt> / <tt>excludedPaths</tt>) does not match the query.</li>
-  
-<li>For the nodetype index, the cost is the sum of the cost for the <tt>jcr:primaryType</tt> lookup  (if the primary type is known),  plus the cost for the <tt>jcr:mixinTypes</tt> lookup (if that is known).</li>
-  
-<li>Otherwise, the cost is based on the overhead (which is 2),  plus the estimated number of entries.</li>
-  
-<li>For an &#x201c;x is not null&#x201d; condition,  the estimated number of entries is  either the configured <tt>entryCount</tt> or, if not set, the  approximate number of entries in the index.  The approximation is an &#x201c;order of magnitude&#x201d; estimation (Morris&#x2019; algorithm).</li>
-  
-<li>For a unique index and &#x201c;x = 1&#x201d; condition,  the estimated number of entries is either 0 or 1  (depending on whether the key is found).</li>
-  
-<li>For a non-unique index and a &#x201c;x = 1&#x201d; condition,  if the <tt>entryCount</tt> and <tt>keyCount</tt> are set, those setting are used to estimate  the number of entries. If not, the  approximate number of entries for the key is read (maintained using Morris&#x2019; algorithm).  In addition to that, the path condition is used to scale down  the estimated count depending on the approximate number of nodes  in that subtree versus the approximate number of entries  in the repository, using approximation available via the <tt>counter</tt> index.</li>
+
+<li>The cost is infinity (so the index is never used) if the condition contains a fulltext constraint, no applicable restriction, the wrong nodetype, or if the path filtering (<tt>includedPaths</tt> / <tt>excludedPaths</tt>) does not match the query.</li>
+<li>For the nodetype index, the cost is the sum of the cost for the <tt>jcr:primaryType</tt> lookup (if the primary type is known), plus the cost for the <tt>jcr:mixinTypes</tt> lookup (if that is known).</li>
+<li>Otherwise, the cost is based on the overhead (which is 2), plus the estimated number of entries.</li>
+<li>For an &#x201c;x is not null&#x201d; condition, the estimated number of entries is either the configured <tt>entryCount</tt> or, if not set, the approximate number of entries in the index. The approximation is an &#x201c;order of magnitude&#x201d; estimation (Morris&#x2019; algorithm).</li>
+<li>For a unique index and &#x201c;x = 1&#x201d; condition, the estimated number of entries is either 0 or 1 (depending on whether the key is found).</li>
+<li>For a non-unique index and a &#x201c;x = 1&#x201d; condition, if the <tt>entryCount</tt> and <tt>keyCount</tt> are set, those setting are used to estimate the number of entries. If not, the approximate number of entries for the key is read (maintained using Morris&#x2019; algorithm). In addition to that, the path condition is used to scale down the estimated count depending on the approximate number of nodes in that subtree versus the approximate number of entries in the repository, using approximation available via the <tt>counter</tt> index.</li>
 </ul>
 <p>For example, for a query with path restriction &#x201c;/content/products/t-shirts&#x201d; and property restriction &#x201c;color = &#x2018;red&#x2019;&#x201d;, if there is an index for the property &#x201c;color&#x201d;, then the entry count approximation is read from the index. Let&#x2019;s say it is 10&#x2019;000 for this value. Then the approximate number of nodes in the subtree &#x201c;/content/products/t-shirts&#x201d; is read (let&#x2019;s say it is 20&#x2019;000), and the approximate number of nodes in the repository (let&#x2019;s say it is 1 million). Therefore, the estimated number of entries is scaled down (divided by 50) from 10&#x2019;000 to 200. The estimated cost is therefore 202, due to the overhead of 2.</p></div></div></div>
         </div>