You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by th...@apache.org on 2017/05/02 12:56:57 UTC
svn commit: r1793484 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md

Author: thomasm
Date: Tue May  2 12:56:56 2017
New Revision: 1793484

URL: http://svn.apache.org/viewvc?rev=1793484&view=rev
Log:
OAK-5520 Improve index and query documentation

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md?rev=1793484&r1=1793483&r2=1793484&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md Tue May  2 12:56:56 2017
@@ -28,6 +28,25 @@ as follows:
         select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%');
         consider creating an index or changing the query
 
+To get good performance, queries should not traverse more than about 1000 nodes
+(specially for queries that are run often).
+
+#### Potentially Slow Queries
+
+In addition to avoiding queries that traverse many nodes, 
+it makes sense to avoid queries that don't use an index.
+Such queries might be fast (and only traverse few nodes) with a small repository,
+but with a large repository they are typically slow as well.
+Therefore, it makes sense to detect such queries as soon as possible
+(in a developer environment), 
+even before the code that runs those queries is tested with a larger repository.
+Oak will detect such queries and log them as follows 
+(with log level INFO for Oak 1.6.x, and WARN for Oak 1.8.x):
+
+    *INFO* org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index): 
+        select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%'; 
+        consider creating an index
+
 #### Query Plan
 
 To understand why the query is slow, the first step is commonly to get the
@@ -60,7 +79,9 @@ But in this case, it already was enough
 
 #### Queries Without Index
 
-Still, there is a message in the log file that complains the query doesn't use an index:
+After changing the query, 
+there is still a message in the log file that complains the query doesn't use an index,
+as described above:
 
     *INFO* org.apache.jackrabbit.oak.query.QueryImpl 
         Traversal query (query without index): 
@@ -74,6 +95,23 @@ an almost empty development repository,
 But for production, there might be a lot more nodes under `/etc/commerce`, 
 so it makes sense to continue optimization.
 
+#### Where Traversal is OK
+
+If it is known from the data model that a query will never traverse many nodes,
+then no index is needed. This is a corner case, and only applies to queries that 
+traverse a fixed number of (for example) configuration nodes, or
+if the number of descendant nodes is guaranteed to be very low by using 
+a certain nodetype that only allows for a fixed number of child nodes.
+If this is the case, then the query can be changed to say traversal is fine.
+To mark such queries, append `option(traversal ok)` to the query.
+This feature should only be used for those rare corner cases.
+
+    select * from [nt:base] 
+    where isdescendantnode('/etc/commerce') 
+    and lower([jcr:title]) like '%coat%'
+    and [commerceType] = 'product'
+    option(traversal ok)
+
 ####Â Estimating Node Counts
 
 To find out how many nodes are in a certain path, you can use the JMX bean `NodeCounter`,
@@ -101,9 +139,14 @@ that queries that _must_ traverse over n
 
 #### Using a Different or New Index
 
-There are now multiple options:
+There are multiple options:
 
-* If there are very few nodes with that nodetype, 
+* Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
+  this is not of much help, because all nodes with that property will need to be read.
+  Also, using `lower` will make the index less effective.
+  So, this only makes sense if there are very few nodes with this property
+  expected to be in the system.
+* If there are very few nodes with that nodetype,
   consider adding `acme:Product` to the nodetype index. This requires reindexing.
   The query could then use the nodetype index, and within this nodetype,
   just traverse below `/etc/commerce`.
@@ -111,11 +154,12 @@ There are now multiple options:
   nodes are in the repository, if this nodetype is indexed.
   To find out, run `getEstimatedChildNodeCounts` with
   `p1=/oak:index/nodetype` and `p2=2`.
-* Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
-  this is not of much help, because all nodes with that property will need to be read.
-  Also, using `lower` will make the index less effective.
-  So, this only makes sense if there are very few nodes with this property
-  expected to be in the system.
+* If the query needs to return added nodes immediately (synchronously; that is without delay),
+  consider creating a [property index](./property-index.html).
+  Note that Lucene indexes are asynchronous, and new nodes may not
+  appear in the result for a few seconds.
+* To ensure there is only one node matching the result in the repository,
+  consider creating a unique [property index](./property-index.html).
 * Consider using a fulltext index, that is: change the query from using 
   `lower([jcr:title]) like '%...%'` to using `contains([jcr:title], '...')`.
   Possibly combine this with adding the property
@@ -128,6 +172,16 @@ The last plan is possibly the best solut
 In case you need to modify or create a Lucene property index,
 you can use the [Oak Index Definition Generator](http://oakutils.appspot.com/generate/index) tool.
 
+As the tool doesn't know your index configuration, it will always suggest
+to create a new index; it might be better to extend an existing index.
+However, note that:
+
+* Changing an existing index requires reindexing that index.
+* If an out-of-the-box index is modified, you will need to merge those modifications 
+  when migrating to newer software.
+  It is best to add documentation to the index definition to simplify merging,
+  for example in the form of "info" properties.
+
 #### Verification
 
 After changing the query, and possibly the index, run the `explain select` again,