You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by th...@apache.org on 2017/05/02 12:56:57 UTC
svn commit: r1793484 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md
Author: thomasm
Date: Tue May 2 12:56:56 2017
New Revision: 1793484
URL: http://svn.apache.org/viewvc?rev=1793484&view=rev
Log:
OAK-5520 Improve index and query documentation
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md?rev=1793484&r1=1793483&r2=1793484&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query-troubleshooting.md Tue May 2 12:56:56 2017
@@ -28,6 +28,25 @@ as follows:
select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%');
consider creating an index or changing the query
+To get good performance, queries should not traverse more than about 1000 nodes
+(specially for queries that are run often).
+
+#### Potentially Slow Queries
+
+In addition to avoiding queries that traverse many nodes,
+it makes sense to avoid queries that don't use an index.
+Such queries might be fast (and only traverse few nodes) with a small repository,
+but with a large repository they are typically slow as well.
+Therefore, it makes sense to detect such queries as soon as possible
+(in a developer environment),
+even before the code that runs those queries is tested with a larger repository.
+Oak will detect such queries and log them as follows
+(with log level INFO for Oak 1.6.x, and WARN for Oak 1.8.x):
+
+ *INFO* org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index):
+ select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%';
+ consider creating an index
+
#### Query Plan
To understand why the query is slow, the first step is commonly to get the
@@ -60,7 +79,9 @@ But in this case, it already was enough
#### Queries Without Index
-Still, there is a message in the log file that complains the query doesn't use an index:
+After changing the query,
+there is still a message in the log file that complains the query doesn't use an index,
+as described above:
*INFO* org.apache.jackrabbit.oak.query.QueryImpl
Traversal query (query without index):
@@ -74,6 +95,23 @@ an almost empty development repository,
But for production, there might be a lot more nodes under `/etc/commerce`,
so it makes sense to continue optimization.
+#### Where Traversal is OK
+
+If it is known from the data model that a query will never traverse many nodes,
+then no index is needed. This is a corner case, and only applies to queries that
+traverse a fixed number of (for example) configuration nodes, or
+if the number of descendant nodes is guaranteed to be very low by using
+a certain nodetype that only allows for a fixed number of child nodes.
+If this is the case, then the query can be changed to say traversal is fine.
+To mark such queries, append `option(traversal ok)` to the query.
+This feature should only be used for those rare corner cases.
+
+ select * from [nt:base]
+ where isdescendantnode('/etc/commerce')
+ and lower([jcr:title]) like '%coat%'
+ and [commerceType] = 'product'
+ option(traversal ok)
+
####Ă‚Â Estimating Node Counts
To find out how many nodes are in a certain path, you can use the JMX bean `NodeCounter`,
@@ -101,9 +139,14 @@ that queries that _must_ traverse over n
#### Using a Different or New Index
-There are now multiple options:
+There are multiple options:
-* If there are very few nodes with that nodetype,
+* Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
+ this is not of much help, because all nodes with that property will need to be read.
+ Also, using `lower` will make the index less effective.
+ So, this only makes sense if there are very few nodes with this property
+ expected to be in the system.
+* If there are very few nodes with that nodetype,
consider adding `acme:Product` to the nodetype index. This requires reindexing.
The query could then use the nodetype index, and within this nodetype,
just traverse below `/etc/commerce`.
@@ -111,11 +154,12 @@ There are now multiple options:
nodes are in the repository, if this nodetype is indexed.
To find out, run `getEstimatedChildNodeCounts` with
`p1=/oak:index/nodetype` and `p2=2`.
-* Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
- this is not of much help, because all nodes with that property will need to be read.
- Also, using `lower` will make the index less effective.
- So, this only makes sense if there are very few nodes with this property
- expected to be in the system.
+* If the query needs to return added nodes immediately (synchronously; that is without delay),
+ consider creating a [property index](./property-index.html).
+ Note that Lucene indexes are asynchronous, and new nodes may not
+ appear in the result for a few seconds.
+* To ensure there is only one node matching the result in the repository,
+ consider creating a unique [property index](./property-index.html).
* Consider using a fulltext index, that is: change the query from using
`lower([jcr:title]) like '%...%'` to using `contains([jcr:title], '...')`.
Possibly combine this with adding the property
@@ -128,6 +172,16 @@ The last plan is possibly the best solut
In case you need to modify or create a Lucene property index,
you can use the [Oak Index Definition Generator](http://oakutils.appspot.com/generate/index) tool.
+As the tool doesn't know your index configuration, it will always suggest
+to create a new index; it might be better to extend an existing index.
+However, note that:
+
+* Changing an existing index requires reindexing that index.
+* If an out-of-the-box index is modified, you will need to merge those modifications
+ when migrating to newer software.
+ It is best to add documentation to the index definition to simplify merging,
+ for example in the form of "info" properties.
+
#### Verification
After changing the query, and possibly the index, run the `explain select` again,