You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by th...@apache.org on 2013/09/23 09:53:31 UTC
svn commit: r1525522 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md

Author: thomasm
Date: Mon Sep 23 07:53:31 2013
New Revision: 1525522

URL: http://svn.apache.org/r1525522
Log:
OAK-301 Document Oak - more query docs

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md?rev=1525522&r1=1525521&r2=1525522&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Mon Sep 23 07:53:31 2013
@@ -15,50 +15,65 @@
    limitations under the License.
   -->
 
-## The query engine
+## The Query Engine
 
-Oak does not index content by default as does Jackrabbit 2. You need to create custom indexes when
-necessary, much like in traditional RDBMSs. If there is no index for a specific query then the
-repository will be traversed. That is, the query will still work but probably be very slow.
+Oak does not index content by default as does Jackrabbit 2. You need to create custom 
+indexes when necessary, much like in traditional RDBMSs. If there is no index for a 
+specific query, then the repository will be traversed. That is, the query will still 
+work but probably be very slow.
 
 Query Indices are defined under the `oak:index` node.
 
-### Xpath to SQL2 transformation
+### XPath to SQL2 Transformation
 
-In an effort to support the `XPath` language, there is a mechanism that handles the automatic conversion to `SQL2`. 
+To support the XPath query language, such queries are internally converted to SQL2. 
 
-Every conversion is logged as `debug` under the `org.apache.jackrabbit.oak.query.QueryEngineImpl` logger:
+Every conversion is logged in `debug` level under the 
+`org.apache.jackrabbit.oak.query.QueryEngineImpl` logger:
 
-    org.apache.jackrabbit.oak.query.QueryEngineImpl Parsing xpath statement: //element(*)[@sling:resourceType = 'slingevent:Lock')]
-    org.apache.jackrabbit.oak.query.QueryEngineImpl XPath > SQL2: select [jcr:path], [jcr:score], * from [nt:base] as a where [sling:resourceType] = 'slingevent:Lock' /* xpath: //element(*)[@sling:resourceType = 'slingevent:Lock' and @lock.created < xs:dateTime('2013-09-02T15:44:05.920+02:00')] */
+    org.apache.jackrabbit.oak.query.QueryEngineImpl Parsing xpath statement: 
+        //element(*)[@sling:resourceType = 'slingevent:Lock')]
+    org.apache.jackrabbit.oak.query.QueryEngineImpl XPath > SQL2: 
+        select [jcr:path], [jcr:score], * from [nt:base] as a 
+        where [sling:resourceType] = 'slingevent:Lock' 
+        /* xpath: //element(*)[@sling:resourceType = 'slingevent:Lock' 
+        and @lock.created < xs:dateTime('2013-09-02T15:44:05.920+02:00')] */
 
-_Note that each transformed `SQL2` query contains the original `XPath` query as a comment._
+_Each transformed SQL2 query contains the original XPath query as a comment._
 
-### Cost calculation
+### Query Processing
 
-Each query index is expected to estimate the worst-case cost to query with the given filter. 
-The returned value is between 1 (very fast; lookup of a unique node) and the estimated number of entries to traverse, if the cursor would be fully read, and if there could in theory be one network round-trip or disk read operation per node (this method may return a lower number if the data is known to be fully in memory).
+Internally, the query engine uses a cost based query optimizer that asks all the available
+query indexes for the estimated cost to process the query. It then uses the index with the 
+lowest cost.
 
-The returned value is supposed to be an estimate and doesn't have to be very accurate. Please note this method is called on each index whenever a query is run, so the method should be reasonably fast (not read any data itself, or at least not read too much data).
+By default, the following indexes are available:
 
-If an index implementation can not query the data, it has to return `Double.POSITIVE_INFINITY`.
+* A property index for each indexed property.
+* A full-text index which is based on Apache Lucene / Solr.
+* A node type index (which is based on an property index for the properties
+  jcr:primaryType and jcr:mixins).
+* A traversal index that iterates over a subtree.
 
-TODO Traversal warnings
+If no index can efficiently process the filter condition, the nodes in the repository are 
+traversed at the given subtree.
+
+Usually, data is read from the index and repository while traversing over the query 
+result. There are exceptions however, where all data is read in memory when the query
+is executed: when using a full-text index, and when using an "order by" clause.
 
-### The property index
+### The Property Index
 
 Is useful whenever there is a query with a property constraint that is not full-text:
 
     SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id
 
-
 To define a property index on a subtree you have to add an index definition node that:
 
 * must be of type `oak:queryIndexDefinition`
 * must have the `type` property set to __`property`__
 * contains the `propertyNames` property that indicates what properties will be stored in the index.
-
-    `propertyNames` can be a list of properties, and it is optional.in case it is missing, the node name will be used as a property name reference value
+  `propertyNames` can be a list of properties, and it is optional.in case it is missing, the node name will be used as a property name reference value
 
 _Optionally_ you can specify
 
@@ -87,13 +102,7 @@ or to simplify you can use one of the ex
     }
 
 
-### The node type index
-
-The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on `jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the filter.
-The cost for this index is the sum of the costs of the `PropertyIndexLookup` for queries on `jcr:primaryType` and `jcr:mixinTypes`.
-
-
-### The Lucene full-text index
+### The Lucene Full-Text Index
 
 The full-text index handles only the 'contains' type of queries.
 
@@ -136,6 +145,23 @@ Example:
     }
 
 
-### The Solr full-text index
+### The Solr Full-Text Index
 
 `TODO`
+
+### The Node Type Index
+
+The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on `jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the filter.
+The cost for this index is the sum of the costs of the `PropertyIndexLookup` for queries on `jcr:primaryType` and `jcr:mixinTypes`.
+
+### Cost Calculation
+
+Each query index is expected to estimate the worst-case cost to query with the given filter. 
+The returned value is between 1 (very fast; lookup of a unique node) and the estimated number of entries to traverse, if the cursor would be fully read, and if there could in theory be one network round-trip or disk read operation per node (this method may return a lower number if the data is known to be fully in memory).
+
+The returned value is supposed to be an estimate and doesn't have to be very accurate. Please note this method is called on each index whenever a query is run, so the method should be reasonably fast (not read any data itself, or at least not read too much data).
+
+If an index implementation can not query the data, it has to return `Double.POSITIVE_INFINITY`.
+
+TODO Traversal warnings
+