You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by ch...@apache.org on 2017/03/16 11:49:23 UTC

svn commit: r1787161 - /jackrabbit/site/live/oak/docs/query/lucene.html

Author: chetanm
Date: Thu Mar 16 11:49:23 2017
New Revision: 1787161

URL: http://svn.apache.org/viewvc?rev=1787161&view=rev
Log:
OAK-5917 - Document enhancements in indexing in 1.6

Add table of content

Modified:
    jackrabbit/site/live/oak/docs/query/lucene.html

Modified: jackrabbit/site/live/oak/docs/query/lucene.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene.html?rev=1787161&r1=1787160&r2=1787161&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene.html Thu Mar 16 11:49:23 2017
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2017-03-14
+ | Generated by Apache Maven Doxia at 2017-03-16
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170314" />
+    <meta name="Date-Revision-yyyymmdd" content="20170316" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak - Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
@@ -216,7 +216,7 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2017-03-14</li>
+                  <li id="publishDate">Last Published: 2017-03-16</li>
                   <li class="divider">|</li> <li id="projectVersion">Version: 1.8-SNAPSHOT</li>
                       
                 
@@ -518,6 +518,106 @@
    See the License for the specific language governing permissions and
    limitations under the License. --><div class="section">
 <h2>Lucene Index<a name="Lucene_Index"></a></h2>
+
+<ul>
+  
+<li><a href="#index-definition">Index Definition</a>
+  
+<ul>
+    
+<li><a href="#indexing-rules">Indexing Rules</a>
+    
+<ul>
+      
+<li><a href="#cost-overrides">Cost Overrides</a></li>
+      
+<li><a href="#indexing-rule-inheritence">Indexing Rule inheritance</a></li>
+      
+<li><a href="#property-definitions">Property Definitions</a></li>
+      
+<li><a href="#path-restrictions">Evaluate Path Restrictions</a></li>
+      
+<li><a href="#include-exclude">Include and Exclude paths from indexing</a></li>
+    </ul></li>
+    
+<li><a href="#aggregation">Aggregation</a></li>
+    
+<li><a href="#analyzers">Analyzers</a>
+    
+<ul>
+      
+<li><a href="#analyzer-classes">Specify analyzer class directly</a></li>
+      
+<li><a href="#analyzer-composition">Create analyzer via composition</a></li>
+    </ul></li>
+    
+<li><a href="#codec">Codec</a></li>
+    
+<li><a href="#boost">Boost and Search Relevancy</a></li>
+  </ul></li>
+  
+<li><a href="#osgi-config">LuceneIndexProvider Configuration</a></li>
+  
+<li><a href="#tika-config">Tika Config</a>
+  
+<ul>
+    
+<li><a href="#mime-type-usage">Mime type usage</a></li>
+  </ul></li>
+  
+<li><a href="#non-root-index">Non Root Index Definitions</a></li>
+  
+<li><a href="#native-query">Native Query and Index Selection</a></li>
+  
+<li><a href="#copy-on-read">CopyOnRead</a></li>
+  
+<li><a href="#copy-on-write">CopyOnWrite</a></li>
+  
+<li><a href="#mbeans">Lucene Index MBeans</a></li>
+  
+<li><a href="#luke">Analyzing created Lucene Index</a></li>
+  
+<li><a href="#text-extraction">Pre-Extracting Text from Binaries</a></li>
+  
+<li><a href="#advanced-search-features">Advanced search features</a>
+  
+<ul>
+    
+<li><a href="#suggestions">Suggestions</a></li>
+    
+<li><a href="#spellchecking">Spellchecking</a></li>
+    
+<li><a href="#facets">Facets</a></li>
+    
+<li><a href="#score-explanation">Score Explanation</a></li>
+    
+<li><a href="#custom-hooks">Custom hooks</a></li>
+  </ul></li>
+  
+<li><a href="#design-considerations">Design Considerations</a></li>
+  
+<li><a href="#lucene-vs-property">Lucene Index vs Property Index</a></li>
+  
+<li><a href="#examples">Examples</a>
+  
+<ul>
+    
+<li><a href="#simple-queries">A - Simple queries</a></li>
+    
+<li><a href="#queries-structured-content">B - Queries for structured content</a>
+    
+<ul>
+      
+<li><a href="#uc1">UC1 - Find all assets which are having <tt>status</tt> as <tt>published</tt></a></li>
+      
+<li><a href="#uc2">UC2 - Find all assets which are having <tt>status</tt> as <tt>published</tt> sorted by last modified date</a></li>
+      
+<li><a href="#uc3">UC3 - Find all assets where comment contains <i>december</i></a></li>
+      
+<li><a href="#uc4">UC4 - Find all assets which are created by David and refer to december</a></li>
+    </ul></li>
+  </ul></li>
+</ul>
 <p>Oak supports Lucene based indexes to support both property constraint and full text constraints. Depending on the configuration a Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.</p>
 
 <div class="source">
@@ -575,7 +675,7 @@
           - nodeScopeIndex = true
 </pre></div>
 <div class="section">
-<h3>Index Definition<a name="Index_Definition"></a></h3>
+<h3><a name="index-definition"></a> Index Definition<a name="Index_Definition"></a></h3>
 <p>Lucene index definition consist of <tt>indexingRules</tt>, <tt>analyzers</tt> , <tt>aggregates</tt> etc which determine which node and properties are to be indexed and how they are indexed.</p>
 <p>Below is the canonical index definition structure</p>
 
@@ -638,7 +738,7 @@
 <dd>Numbers of terms indexed per field. Defaults to 10000</dd>
 </dl>
 <div class="section">
-<h4>Indexing Rules<a name="Indexing_Rules"></a></h4>
+<h4><a name="indexing-rules"></a> Indexing Rules<a name="Indexing_Rules"></a></h4>
 <p>Indexing rules defines which types of node and properties are indexed. An index configuration can define one or more <tt>indexingRules</tt> for different nodeTypes.</p>
 
 <div class="source">
@@ -701,7 +801,7 @@
   </ul></dd>
 </dl>
 <div class="section">
-<h5>Cost Overrides<a name="Cost_Overrides"></a></h5>
+<h5><a name="cost-overrides"></a> Cost Overrides<a name="Cost_Overrides"></a></h5>
 <p>By default, the cost of using this index is calculated follows: For each query, the overhead is one operation. For each entry in the index, the cost is one. The following only applies to <tt>compatVersion</tt> 2 only: To use use a lower or higher cost, you can set the following optional properties in the index definition:</p>
 
 <div class="source">
@@ -710,11 +810,11 @@
 </pre></div>
 <p>Please note that typically, those settings don&#x2019;t need to be explicitly set. Cost per execution is the overhead of one query. Cost per entry is the cost per node in the index. Using 0.5 means the cost is half, which means the index would be used used more often (that is, even if there is a different index with similar cost).</p></div>
 <div class="section">
-<h5>Indexing Rule inheritance<a name="Indexing_Rule_inheritance"></a></h5>
+<h5><a name="indexing-rule-inheritence"></a>Indexing Rule inheritance<a name="Indexing_Rule_inheritance"></a></h5>
 <p><tt>indexRules</tt> are defined per nodeType and support nodeType inheritance. For example while indexing any node the indexer would lookup for applicable indexRule for that node based on its <i>primaryType</i>. If a direct match is found then that rule would be used otherwise it would look for rule for any of the parent types. The rules are looked up in the order of there entry under <tt>indexRules</tt> node (indexRule node itself is of type <tt>nt:unstructured</tt> which has <tt>orderable</tt> child nodes)</p>
 <p>If <tt>inherited</tt> is set to false on any rule then that rule would only be applicable if exact match is found</p></div>
 <div class="section">
-<h5>Property Definitions<a name="Property_Definitions"></a></h5>
+<h5><a name="property-definitions"></a>Property Definitions<a name="Property_Definitions"></a></h5>
 <p>Each index rule consist of one ore more property definition defined under <tt>properties</tt>. Order of property definition node is important as some properties are based on regular expressions. Below is the canonical property definition structure</p>
 
 <div class="source">
@@ -832,20 +932,18 @@
 <li>Regular Expression - Like <i>.*</i>. Used when only property whose name  match given pattern are to be indexed.  They can also be used for relative properties like  <i>jcr:content/metadata/dc:.*$</i>  which indexes all property names starting with <i>dc</i> from node with  relative path <i>jcr:content/metadata</i></li>
   
 <li>The string <tt>:nodeName</tt> - this special case indexes node name as if it&#x2019;s a  virtual property of the node being indexed. Setting this along with  <tt>nodeScopeIndex=true</tt> is akin to setting <tt>indexNodeName=true</tt> on indexing  rule. (<tt>@since Oak 1.3.15, 1.2.14</tt>)</li>
-</ol>
-<p><a name="path-restrictions"></a></p></div>
+</ol></div>
 <div class="section">
-<h5>Evaluate Path Restrictions<a name="Evaluate_Path_Restrictions"></a></h5>
+<h5><a name="path-restrictions"></a> Evaluate Path Restrictions<a name="Evaluate_Path_Restrictions"></a></h5>
 <p>Lucene index provides support for evaluating path restrictions natively. Consider a query like</p>
 
 <div class="source">
 <pre>select * from [app:Asset] as a where isdescendantnode(a, [/content/app/old]) AND contains(*, 'white')
 </pre></div>
 <p>By default the index would return all node which <i>contain white</i> and Query engine would filter out nodes which are not under <i>/content/app/old</i>. This can perform slow if lots of nodes are not under that path. To speed up such queries one can enable <tt>evaluatePathRestrictions</tt> in Lucene index and index would only return nodes which are under <i>/content/app/old</i>.</p>
-<p>Enabling this feature would incur cost in terms of slight increase in index size. Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2306">OAK-2306</a> for more details.</p>
-<p><a name="include-exclude"></a></p></div>
+<p>Enabling this feature would incur cost in terms of slight increase in index size. Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2306">OAK-2306</a> for more details.</p></div>
 <div class="section">
-<h5>Include and Exclude paths from indexing<a name="Include_and_Exclude_paths_from_indexing"></a></h5>
+<h5><a name="include-exclude"></a> Include and Exclude paths from indexing<a name="Include_and_Exclude_paths_from_indexing"></a></h5>
 <p><tt>@since Oak 1.0.14, 1.2.3</tt></p>
 <p>By default the indexer would index all the nodes under the subtree where the index definition is defined as per the indexingRule. In some cases its required to index nodes under certain path. For e.g. if index is defined for global fulltext index which include the complete repository you might want to exclude certain path which contains transient system data. </p>
 <p>For example if you application stores certain logs under <tt>/var/log</tt> and it is not supposed to be indexed as part of fulltext index then it can be excluded</p>
@@ -880,10 +978,9 @@
 <p>Sub-root index definitions (e.g. <tt>/test/oak:index/index-def-node</tt>) -  <tt>excludedPaths</tt> and <tt>includedPaths</tt> need to be relative to the path that index is defined for. e.g. if the condition is supposed to be put for <tt>/test/a</tt> where the index definition is at <tt>/test/oak:index/index-def-node</tt> then <tt>/a</tt> needs to be put as value of <tt>excludedPaths</tt> or <tt>includedPaths</tt>. On the other hand, <tt>queryPaths</tt> remains to be an absolute path. So, for the example above, <tt>queryPaths</tt> would get the value <tt>/test/a</tt>.</p></li>
 </ol>
 <p>In most cases use of <tt>queryPaths</tt> would not be required as index definition should not have any overlap. </p>
-<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2599">OAK-2599</a> for more details.</p>
-<p><a name="aggregation"></a></p></div></div>
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2599">OAK-2599</a> for more details.</p></div></div>
 <div class="section">
-<h4>Aggregation<a name="Aggregation"></a></h4>
+<h4><a name="aggregation"></a>Aggregation<a name="Aggregation"></a></h4>
 <p>Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.</p>
 <p>Oak allows you to define index aggregates based on relative path patterns and primary node types. Changes to aggregated items cause the main item to be reindexed, even if it was not modified.</p>
 <p>Aggregation configuration is defined under the <tt>aggregates</tt> node under index configuration. The following example creates an index aggregate on nt:file that includes the content of the jcr:content node:</p>
@@ -983,7 +1080,7 @@
         - relativeNode = true
 </pre></div></div>
 <div class="section">
-<h4>Analyzers<a name="Analyzers"></a></h4>
+<h4><a name="analyzers"></a>Analyzers<a name="Analyzers"></a></h4>
 <p><tt>@since Oak 1.5.5, 1.4.7</tt> Unless custom analyzer is configured (as documented below), in-built analyzer can be configured to include original term as well to be indexed. This is controlled by setting boolean property <tt>indexOriginalTerm</tt> on analyzers node.</p>
 
 <div class="source">
@@ -1007,7 +1104,7 @@
         ...
 </pre></div>
 <div class="section">
-<h5>Specify analyzer class directly<a name="Specify_analyzer_class_directly"></a></h5>
+<h5><a name="analyzer-classes"></a>Specify analyzer class directly<a name="Specify_analyzer_class_directly"></a></h5>
 <p>If any of the out of the box analyzer is to be used then it can configured directly</p>
 
 <div class="source">
@@ -1027,7 +1124,7 @@
             + stopwords (nt:file)
 </pre></div></div>
 <div class="section">
-<h5>Create analyzer via composition<a name="Create_analyzer_via_composition"></a></h5>
+<h5><a name="analyzer-composition"></a>Create analyzer via composition<a name="Create_analyzer_via_composition"></a></h5>
 <p>Analyzers can also be composed based on <tt>Tokenizers</tt>, <tt>TokenFilters</tt> and <tt>CharFilters</tt>. This is similar to the support provided in Solr where you can <a class="externalLink" href="https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema">configure analyzers in xml</a></p>
 
 <div class="source">
@@ -1099,10 +1196,9 @@
 <li><a class="externalLink" href="https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema">https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema</a></li>
   </ul></li>
 </ol>
-<p>Note that currently only one analyzer can be configured per index. Its not possible to specify separate analyzer for query and index time currently. </p>
-<p><a name="codec"></a></p></div></div>
+<p>Note that currently only one analyzer can be configured per index. Its not possible to specify separate analyzer for query and index time currently. </p></div></div>
 <div class="section">
-<h4>Codec<a name="Codec"></a></h4>
+<h4><a name="codec"></a>Codec<a name="Codec"></a></h4>
 <p>Name of <a class="externalLink" href="https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/codecs/Codec.html">Lucene Codec</a> to use. By default if the index involves fulltext indexing then Oak Lucene uses <tt>OakCodec</tt> which disables compression. Due to this the index size may grow large. To enable compression you can set the codec to <tt>Lucene46</tt></p>
 
 <div class="source">
@@ -1112,10 +1208,9 @@
   - type = &quot;lucene&quot;
   - codec = &quot;Lucene46&quot;
 </pre></div>
-<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2853">OAK-2853</a> for details. Enabling the <tt>Lucene46</tt> codec would lead to smaller and compact indexes.</p>
-<p><a name="boost"></a></p></div>
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2853">OAK-2853</a> for details. Enabling the <tt>Lucene46</tt> codec would lead to smaller and compact indexes.</p></div>
 <div class="section">
-<h4>Boost and Search Relevancy<a name="Boost_and_Search_Relevancy"></a></h4>
+<h4><a name="boost"></a>Boost and Search Relevancy<a name="Boost_and_Search_Relevancy"></a></h4>
 <p><tt>@since Oak 1.2.5</tt></p>
 <p>When fulltext indexing is enabled then internally Oak would create a fulltext field which consists of text extracted from various other fields i.e. fields for which <tt>nodeScopeIndex</tt> is <tt>true</tt>. This allows search like <tt>//*[jcr:contains(., 'foo')]</tt> to perform search across any indexable field containing foo (See <a class="externalLink" href="http://www.day.com/specs/jcr/1.0/6.6.5.2_jcr_contains_Function.html">contains function</a> for details)</p>
 <p>In certain cases its desirable that those nodes where the searched term is present in a specific property are ranked higher (come earlier in search result) compared to those node where the searched term is found in some other property.</p>
@@ -1148,10 +1243,9 @@ FROM [app:Asset]
 WHERE 
   CONTAINS(., 'Batman')
 </pre></div>
-<p>Would have those node (of type app:Asset) come first where <i>Batman</i> is found in <i>jcr:title</i>. While those nodes where search text is found in other field like aggregated content would come later</p>
-<p><a name="osgi-config"></a></p></div></div>
+<p>Would have those node (of type app:Asset) come first where <i>Batman</i> is found in <i>jcr:title</i>. While those nodes where search text is found in other field like aggregated content would come later</p></div></div>
 <div class="section">
-<h3>LuceneIndexProvider Configuration<a name="LuceneIndexProvider_Configuration"></a></h3>
+<h3><a name="osgi-config"></a>LuceneIndexProvider Configuration<a name="LuceneIndexProvider_Configuration"></a></h3>
 <p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache
 .jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
 <p><img src="lucene-osgi-config.png" alt="OSGi Configuration" /></p>
@@ -1170,7 +1264,7 @@ WHERE
 <dd>If enabled then Lucene logging would be integrated with Slf4j</dd>
 </dl></div>
 <div class="section">
-<h3>Tika Config<a name="Tika_Config"></a></h3>
+<h3><a name="tika-config"></a>Tika Config<a name="Tika_Config"></a></h3>
 <p><tt>@since Oak 1.0.12, 1.2.3</tt></p>
 <p>Oak Lucene uses <a class="externalLink" href="http://tika.apache.org/">Apache Tika</a> to extract the text from binary content</p>
 
@@ -1195,20 +1289,18 @@ WHERE
   </ul></dd>
 </dl>
 <div class="section">
-<h4>Mime type usage<a name="Mime_type_usage"></a></h4>
-<p>A binary would only be index if there is an associated property <tt>jcr:mimeType</tt> defined and that is supported by Tika. By default indexer uses <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2895">TypeDetector</a> instead of default <tt>DefaultDetector</tt> which relies on the <tt>jcr:mimeType</tt> to pick up the right parser. </p>
-<p><a name="non-root-index"></a></p></div></div>
+<h4><a name="mime-type-usage"></a>Mime type usage<a name="Mime_type_usage"></a></h4>
+<p>A binary would only be index if there is an associated property <tt>jcr:mimeType</tt> defined and that is supported by Tika. By default indexer uses <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2895">TypeDetector</a> instead of default <tt>DefaultDetector</tt> which relies on the <tt>jcr:mimeType</tt> to pick up the right parser. </p></div></div>
 <div class="section">
-<h3>Non Root Index Definitions<a name="Non_Root_Index_Definitions"></a></h3>
+<h3><a name="non-root-index"></a>Non Root Index Definitions<a name="Non_Root_Index_Definitions"></a></h3>
 <p>Lucene index definition can be defined at any location in repository and need not always be defined at root. For example if your query involves path restrictions like</p>
 
 <div class="source">
 <pre>select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
 </pre></div>
-<p>Then you can create the required index definition say <tt>assetIndex</tt> at <tt>/content/companya/oak:index/assetIndex</tt>. In such a case that index would contain data for the subtree under <tt>/content/companya</tt></p>
-<p><a name="native-query"></a></p></div>
+<p>Then you can create the required index definition say <tt>assetIndex</tt> at <tt>/content/companya/oak:index/assetIndex</tt>. In such a case that index would contain data for the subtree under <tt>/content/companya</tt></p></div>
 <div class="section">
-<h3>Native Query and Index Selection<a name="Native_Query_and_Index_Selection"></a></h3>
+<h3><a name="native-query"></a>Native Query and Index Selection<a name="Native_Query_and_Index_Selection"></a></h3>
 <p>Oak query engine supports native queries like</p>
 
 <div class="source">
@@ -1229,7 +1321,7 @@ WHERE
 <pre>//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
 </pre></div></div>
 <div class="section">
-<h3>Persisting indexes to FileSystem<a name="Persisting_indexes_to_FileSystem"></a></h3>
+<h3><a name="native-query"></a>Persisting indexes to FileSystem<a name="Persisting_indexes_to_FileSystem"></a></h3>
 <p>By default Lucene indexes are stored in the <tt>NodeStore</tt>. If required they can be stored on the file system directly</p>
 
 <div class="source">
@@ -1240,30 +1332,27 @@ WHERE
 - path = &quot;/path/to/store/index&quot;
 </pre></div>
 <p>To store the Lucene index in the file system, in the Lucene index definition node, set the property <tt>persistence</tt> to <tt>file</tt>, and set the property <tt>path</tt> to the directory where the index should be stored. Then start reindexing by setting <tt>reindex</tt> to <tt>true</tt>.</p>
-<p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p>
-<p><a name="copy-on-read"></a></p></div>
+<p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p></div>
 <div class="section">
-<h3>CopyOnRead<a name="CopyOnRead"></a></h3>
+<h3><a name="copy-on-read"></a>CopyOnRead<a name="CopyOnRead"></a></h3>
 <p>Lucene indexes are stored in <tt>NodeStore</tt>. Oak Lucene provides a custom directory implementation which enables Lucene to load index from <tt>NodeStore</tt>. This might cause performance degradation if the <tt>NodeStore</tt> storage is remote. For such case Oak Lucene provide a <tt>CopyOnReadDirectory</tt> which copies the index content to a local directory and enables Lucene to make use of local directory based indexes while performing queries.</p>
 <p>At runtime various details related to copy on read features are exposed via <tt>CopyOnReadStats</tt> MBean. Indexes at JCR path e.g. <tt>/oak:index/assetIndex</tt> would be copied to <tt>&lt;index dir&gt;/&lt;hash of jcr path&gt;</tt>. To determine mapping between local index directory and JCR path refer to the MBean details</p>
 <p><img src="lucene-index-copier-mbean.png" alt="CopyOnReadStats" /></p>
 <p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1724">OAK-1724</a>. This feature can be enabled via <a href="#osgi-config">Lucene Index provider service configuration</a></p>
-<p><i>With Oak 1.0.13 this feature is now enabled by default.</i></p>
-<p><a name="copy-on-write"></a></p></div>
+<p><i>With Oak 1.0.13 this feature is now enabled by default.</i></p></div>
 <div class="section">
-<h3>CopyOnWrite<a name="CopyOnWrite"></a></h3>
+<h3><a name="copy-on-write"></a>CopyOnWrite<a name="CopyOnWrite"></a></h3>
 <p><tt>@since Oak 1.0.15, 1.2.3</tt></p>
 <p>Similar to <i>CopyOnRead</i> feature Oak Lucene also supports <i>CopyOnWrite</i> to enable faster indexing by first buffering the writes to local filesystem and transferring them to remote storage asynchronously as the indexing proceeds. This should provide better performance and hence faster indexing times.</p>
 <p><b>indexPath</b></p>
 <p>To speed up the indexing with CopyOnWrite you would also need to set <tt>indexPath</tt> in index definition to the path of index in the repository. For e.g. if your index is defined at <tt>/oak:index/lucene</tt> then value of <tt>indexPath</tt> should be set to <tt>/oak:index/lucene</tt>. This would enable the indexer to perform any read during the indexing process locally and thus avoid costly read from remote</p>
 <p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2247">OAK-2247</a>. This feature can be enabled via <a href="#osgi-config">Lucene Index provider service configuration</a></p></div>
 <div class="section">
-<h3>Lucene Index MBeans<a name="Lucene_Index_MBeans"></a></h3>
+<h3><a name="mbeans"></a>Lucene Index MBeans<a name="Lucene_Index_MBeans"></a></h3>
 <p>Oak Lucene registers a JMX bean <tt>LuceneIndex</tt> which provide details about the index content e.g. size of index, number of documents present in index etc</p>
-<p><img src="lucene-index-mbean.png" alt="Lucene Index MBean" /></p>
-<p><a name="luke"></a></p></div>
+<p><img src="lucene-index-mbean.png" alt="Lucene Index MBean" /></p></div>
 <div class="section">
-<h3>Analyzing created Lucene Index<a name="Analyzing_created_Lucene_Index"></a></h3>
+<h3><a name="luke"></a>Analyzing created Lucene Index<a name="Analyzing_created_Lucene_Index"></a></h3>
 <p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a> is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
 
 <ol style="list-style-type: decimal">
@@ -1306,10 +1395,9 @@ Copied 8.5 MB in 218.7 ms
 <pre>$ java -XX:MaxPermSize=512m -cp luke-with-deps.jar:oak-lucene-1.0.8.jar org.getopt.luke.Luke
 </pre></div></li>
 </ol>
-<p>From the Luke UI shown you can access various details.</p>
-<p><a name="text-extraction"></a></p></div>
+<p>From the Luke UI shown you can access various details.</p></div>
 <div class="section">
-<h3>Pre-Extracting Text from Binaries<a name="Pre-Extracting_Text_from_Binaries"></a></h3>
+<h3><a name="text-extraction"></a>Pre-Extracting Text from Binaries<a name="Pre-Extracting_Text_from_Binaries"></a></h3>
 <p><tt>@since Oak 1.0.18, 1.2.3</tt></p>
 <p>Lucene indexing is performed in a single threaded mode. Extracting text from binaries is an expensive operation and slows down the indexing rate considerably. For incremental indexing this mostly works fine but if performing a reindex or creating the index for the first time after migration then it increases the indexing time considerably. </p>
 <p>To speed up the Lucene indexing for such cases i.e. reindexing, we can decouple the text extraction from actual indexing. </p>
@@ -1353,9 +1441,9 @@ org.apache.jackrabbit.oak.run.Main tika
 <p>Once <tt>PreExtractedTextProvider</tt> is configured then upon reindexing Lucene indexer would make use of it to check if text needs to be extracted or not. Check <tt>TextExtractionStatsMBean</tt> for various statistics around text extraction and also to validate if <tt>PreExtractedTextProvider</tt> is being used.</p>
 <p>For more details on this feature refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2892">OAK-2892</a></p></div>
 <div class="section">
-<h3>Advanced search features<a name="Advanced_search_features"></a></h3>
+<h3><a name="advanced-search-features"></a>Advanced search features<a name="Advanced_search_features"></a></h3>
 <div class="section">
-<h4>Suggestions<a name="Suggestions"></a></h4>
+<h4><a name="suggestions"></a>Suggestions<a name="Suggestions"></a></h4>
 <p><tt>@since Oak 1.1.17, 1.0.15</tt></p>
 <p>In order to use Lucene index to perform search suggestions, the index definition node (the one of type <tt>oak:QueryIndexDefinition</tt>) needs to have the <tt>compatVersion</tt> set to <tt>2</tt>, then one or more property nodes, depending on use case, need to have the property <tt>useInSuggest</tt> set to <tt>true</tt>, such setting controls from which properties terms to be used for suggestions will be taken.</p>
 <p>Once the above configuration has been done, by default, the Lucene suggester is updated every 10 minutes but that can be changed by setting the property <tt>suggestUpdateFrequencyMinutes</tt> in <tt>suggestion</tt> node under the index definition node to a different value. <i>Note that up till Oak 1.3.14/1.2.14, <tt>suggestUpdateFrequencyMinutes</tt> was to be setup at index definition node itself. That is is still supported for backward compatibility, but having a separate <tt>suggestion</tt> node is preferred.</i></p>
@@ -1399,7 +1487,7 @@ SELECT rep:suggest() FROM [nt:base] WHER
 /jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
 </tt> Note, the subset is done by filtering top 10 suggestions. So, it&#x2019;s possible to get no suggestions for a subtree query, if top 10 suggestions are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
 <div class="section">
-<h4>Spellchecking<a name="Spellchecking"></a></h4>
+<h4><a name="spellchecking"></a>Spellchecking<a name="Spellchecking"></a></h4>
 <p><tt>@since Oak 1.1.17, 1.0.13</tt></p>
 <p>In order to use Lucene index to perform spellchecking, the index definition node (the one of type <tt>oak:QueryIndexDefinition</tt>) needs to have the <tt>compatVersion</tt> set to <tt>2</tt>, then one or more property nodes, depending on use case, need to have the property <tt>useInSpellcheck</tt> set to <tt>true</tt>, such setting controls from which properties terms to be used for spellcheck corrections will be taken.</p>
 <p>Sample configuration for spellchecking based on terms contained in <tt>jcr:title</tt> property.</p>
@@ -1427,7 +1515,7 @@ SELECT rep:suggest() FROM [nt:base] WHER
 /jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
 </tt> Note, the subset is done by filtering top 10 spellchecks. So, it&#x2019;s possible to get no results for a subtree query, if top 10 spellchecks are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
 <div class="section">
-<h4>Facets<a name="Facets"></a></h4>
+<h4><a name="facets"></a>Facets<a name="Facets"></a></h4>
 <p><tt>@since Oak 1.3.14</tt></p>
 <p>Lucene property indexes can also be used for retrieving facets, in order to do so the property <i>facets</i> must be set to  <i>true</i> on the property definition.</p>
 
@@ -1467,16 +1555,16 @@ SELECT rep:suggest() FROM [nt:base] WHER
           - propertyIndex = true
 </pre></div></div>
 <div class="section">
-<h4>Score Explanation<a name="Score_Explanation"></a></h4>
+<h4><a name="score-explanation"></a>Score Explanation<a name="Score_Explanation"></a></h4>
 <p><tt>@since Oak 1.3.12</tt></p>
 <p>Lucene supports <a class="externalLink" href="https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/IndexSearcher.html#explain%28org.apache.lucene.search.Query,%20int%29">explanation of scores</a> which can be selected in a query using a virtual column <tt>oak:scoreExplanation</tt>. e.g. <tt>select [oak:scoreExplanation], * from [nt:base] where foo='bar'</tt></p>
 <p><i>Note that showing explanation score is expensive. So, this feature should be used for debug purposes only</i>.</p></div>
 <div class="section">
-<h4>Custom hooks<a name="Custom_hooks"></a></h4>
+<h4><a name="custom-hooks"></a>Custom hooks<a name="Custom_hooks"></a></h4>
 <p><tt>@since Oak 1.3.14</tt></p>
 <p>In OSGi enviroment, implementations of <tt>IndexFieldProvider</tt> and <tt>FulltextQueryTermsProvider</tt> under <tt>org.apache.jackrabbit.oak.plugins.index.lucene.spi</tt> (see javadoc <a class="externalLink" href="http://www.javadoc.io/doc/org.apache.jackrabbit/oak-lucene/">here</a>) are called during indexing and querying as documented in javadocs.</p></div></div>
 <div class="section">
-<h3>Design Considerations<a name="Design_Considerations"></a></h3>
+<h3><a name="design-considerations"></a>Design Considerations<a name="Design_Considerations"></a></h3>
 <p>Lucene index provides quite a few features to meet various query requirements. While defining the index definition do consider the following aspects</p>
 
 <ol style="list-style-type: decimal">
@@ -1508,7 +1596,7 @@ SELECT rep:suggest() FROM [nt:base] WHER
 </ol>
 <p>Following analogy might be helpful to people coming from RDBMS world. Treat your nodetype as Table in your DB and all the direct or relative properties as columns in that table. Various property definitions can then be considered as index for those columns. </p></div>
 <div class="section">
-<h3>Lucene Index vs Property Index<a name="Lucene_Index_vs_Property_Index"></a></h3>
+<h3><a name="lucene-vs-property"></a>Lucene Index vs Property Index<a name="Lucene_Index_vs_Property_Index"></a></h3>
 <p>Lucene based index can be restricted to index only specific properties and in that case it is similar to <a href="query.html#property-index">Property Index</a>. However it differs from property index in following aspects</p>
 
 <ol style="list-style-type: decimal">
@@ -1521,9 +1609,9 @@ SELECT rep:suggest() FROM [nt:base] WHER
 <p>Lucene index cannot enforce uniqueness constraint - By virtue of it being asynchronous it cannot enforce uniqueness constraint.</p></li>
 </ol></div>
 <div class="section">
-<h3>Examples<a name="Examples"></a></h3>
+<h3><a name="examples"></a>Examples<a name="Examples"></a></h3>
 <div class="section">
-<h4>A - Simple queries<a name="A_-_Simple_queries"></a></h4>
+<h4><a name="simple-queries"></a>A - Simple queries<a name="A_-_Simple_queries"></a></h4>
 <p>In many cases the query is purely based on some specific property and is not restricted to any specific nodeType</p>
 
 <div class="source">
@@ -1625,7 +1713,7 @@ AND [offTime] &gt; CAST('2015-04-06T02:2
           - name = &quot;offTime&quot;
 </pre></div></div>
 <div class="section">
-<h4>B - Queries for structured content<a name="B_-_Queries_for_structured_content"></a></h4>
+<h4><a name="queries-structured-content"></a>B - Queries for structured content<a name="B_-_Queries_for_structured_content"></a></h4>
 <p>Queries in previous examples were based on mostly unstructured content where no nodeType restrictions were applied. However in many cases the nodes being queried confirm to certain structure. For example you have following content</p>
 
 <div class="source">
@@ -1651,7 +1739,7 @@ AND [offTime] &gt; CAST('2015-04-06T02:2
           - jcr:data = ...
 </pre></div>
 <p>Content like above is then queried in multiple ways. So lets take first query</p>
-<p><b>UC1 - Find all assets which are having <tt>status</tt> as <tt>published</tt></b></p>
+<p><a name="uc1"></a> <b>UC1 - Find all assets which are having <tt>status</tt> as <tt>published</tt></b></p>
 
 <div class="source">
 <pre>SELECT
@@ -1685,7 +1773,7 @@ WHERE
   
 <li>Indexes relative property <tt>jcr:content/metadata/status</tt> for all such nodes</li>
 </ul>
-<p><b>UC2 - Find all assets which are having <tt>status</tt> as <tt>published</tt> sorted by last modified date</b></p>
+<p><a name="uc2"></a> <b>UC2 - Find all assets which are having <tt>status</tt> as <tt>published</tt> sorted by last modified date</b></p>
 
 <div class="source">
 <pre>SELECT
@@ -1721,7 +1809,7 @@ ORDER BY
   
 <li>Indexes both <tt>status</tt> and <tt>jcr:lastModified</tt></li>
 </ul>
-<p><b>UC3 - Find all assets where comment contains <i>december</i></b></p>
+<p><a name="uc3"></a> <b>UC3 - Find all assets where comment contains <i>december</i></b></p>
 
 <div class="source">
 <pre>SELECT
@@ -1748,7 +1836,7 @@ WHERE
   
 <li><tt>propertyIndex</tt> is not enabled as this property is not going to be used to  perform equality check</li>
 </ul>
-<p><b>UC4 - Find all assets which are created by David and refer to december </b></p>
+<p><a name="uc4"></a> <b>UC4 - Find all assets which are created by David and refer to december </b></p>
 
 <div class="source">
 <pre>SELECT