You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by to...@apache.org on 2015/06/12 17:03:03 UTC
svn commit: r1685098 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
Author: tommaso
Date: Fri Jun 12 15:03:03 2015
New Revision: 1685098
URL: http://svn.apache.org/r1685098
Log:
OAK-2175, OAK-2176, OAK-2958 - added Solr specific documentation for spellchecking and suggestions
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md?rev=1685098&r1=1685097&r2=1685098&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md Fri Jun 12 15:03:03 2015
@@ -19,13 +19,13 @@ The Solr index is mainly meant for full-
//*[jcr:contains(., 'text')]
-but is also able to search by path, property restrictions and primary type restrictions.
-This means the Solr index in Oak can be used for any type of JCR query.
+but is also able to search by path and property restrictions.
+Primary type restriction support is also provided by it's not recommended as it's usually much better to use the [node type
+index](../query.html#The_Node_Type_Index) for such kind of queries.
Even if it's not just a full-text index, it's recommended to use it asynchronously (see `Oak#withAsyncIndexing`)
-because, in most production scenarios, it'll be a 'remote' index, and therefore network eventual latency / errors would
+because, in most production scenarios, it'll be a 'remote' index and therefore network latency / errors would
have less impact on the repository performance.
-To set up the Solr index to be asynchronous that has to be defined inside the index definition, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980)
TODO Node aggregation.
@@ -35,9 +35,7 @@ The index definition node for a Solr-bas
* must be of type `oak:QueryIndexDefinition`
* must have the `type` property set to __`solr`__
- * must contain the `async` property set to the value `async`, this is what sends the
-
-index update process to a background thread.
+ * must contain the `async` property set to the value `async`, this is what sends the index update process to a background thread.
_Optionally_ one can add
@@ -54,6 +52,23 @@ Example:
.setProperty("reindex", true);
}
+#### Configuring the Solr index
+
+Besides the mandatory index definition parameters (and `reindex`), a number of additional parameters can be defined in
+ Oak Solr index configuration.
+Such a configuration is composed by:
+
+ - the Solr server configuration (see [SolrServerConfiguration](http://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/index/solr/configuration/SolrServerConfiguration.html))
+ - the search / indexing configuration (see [OakSolrConfiguration](http://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/index/solr/configuration/OakSolrConfiguration.html))
+
+##### Solr server configuration options
+
+TBD
+
+##### Search / indexing configuration options
+
+TBD
+
#### Setting up the Solr server
For the Solr index to work Oak needs to be able to communicate with a Solr instance / cluster.
Apache Solr supports multiple deployment architectures:
@@ -67,20 +82,8 @@ Apache Solr supports multiple deployment
The Oak Solr index can be configured to either use an 'embedded Solr server' or a 'remote Solr server' (being able to
connect to a single remote instance or to a SolrCloud cluster via Zookeeper).
-##### OSGi environment
-All the Solr configuration parameters are described in the 'Solr Server Configuration' section on the
-[OSGi configuration](osgi_config.html) page.
-
-Create an index definition for the Solr index, as described [above](#solr-index-definition).
-Once the query index definition node has been created, access OSGi ConfigurationAdmin via e.g. Apache Felix WebConsole:
-
- 1. find the 'Oak Solr indexing / search configuration' item and eventually change configuration properties as needed
- 2. find either the 'Oak Solr embedded server configuration' or 'Oak Solr remote server configuration' items depending
- on the chosen Solr architecture and eventually change configuration properties as needed
- 3. find the 'Oak Solr server provider' item and select the chosen provider ('remote' or 'embedded')
-
-##### Solr server configurations
-Depending on the use case, different Solr server configurations are recommended.
+##### Supported Solr deployments
+Depending on the use case, different Solr server deployments are recommended.
###### Embedded Solr server
The embedded Solr server is recommended for developing and testing the Solr index for an Oak repository. With that an
@@ -119,11 +122,52 @@ SolrCloud also allows the hot deploy of
from a local directory, this is controlled by the _solr.conf.dir_ property of the 'Oak Solr remote server configuration'.
For a detailed description of how SolrCloud works see the [Solr reference guide](https://cwiki.apache.org/confluence/display/solr/SolrCloud).
-#### Differences with the Lucene index
+##### OSGi environment
+All the Solr configuration parameters are described in the 'Solr Server Configuration' section on the
+[OSGi configuration](osgi_config.html) page.
+
+Create an index definition for the Solr index, as described [above](#solr-index-definition).
+Once the query index definition node has been created, access OSGi ConfigurationAdmin via e.g. Apache Felix WebConsole:
+
+ 1. find the 'Oak Solr indexing / search configuration' item and eventually change configuration properties as needed
+ 2. find either the 'Oak Solr embedded server configuration' or 'Oak Solr remote server configuration' items depending
+ on the chosen Solr architecture and eventually change configuration properties as needed
+ 3. find the 'Oak Solr server provider' item and select the chosen provider ('remote' or 'embedded')
+
+#### Advanced search features
+
+##### Suggestions
+
+`@since Oak 1.1.17, 1.0.15`
+
+Default Solr configuration ([solrconfig.xml](https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/resources/solr/oak/conf/solrconfig.xml#L1102)
+and [schema.xml](https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/resources/solr/oak/conf/schema.xml#L119))
+comes with a preconfigured suggest component, which uses Lucene's [FuzzySuggester](https://lucene.apache.org/core/4_7_0/suggest/org/apache/lucene/search/suggest/analyzing/FuzzySuggester.html)
+under the hood. Updating the suggester in [default configuration](https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/resources/solr/oak/conf/solrconfig.xml#L1110)
+is done every time a `commit` request is sent to Solr however it's recommended not to do that in production systems if possible,
+as it's much better to send explicit request to Solr to rebuild the suggester dictionary, e.g. once a day, week, etc.
+
+More / different suggesters can be configured in Solr, as per [reference documentation](https://cwiki.apache.org/confluence/display/solr/Suggester).
+
+##### Spellchecking
+
+`@since Oak 1.1.17, 1.0.15`
+
+Default Solr configuration ([solrconfig.xml](https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/resources/solr/oak/conf/solrconfig.xml#L1177))
+comes with a preconfigured spellchecking component, which uses Lucene's [DirectSpellChecker](http://lucene.apache.org/core/4_7_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html)
+under the hood as it doesn't require any additional data structure neither in RAM nor on disk.
+
+More / different spellcheckers can be configured in Solr, as per [reference documentation](https://cwiki.apache.org/confluence/display/solr/Spell+Checking).
+
+#### Notes
As of Oak version 1.0.0:
-* Solr index doesn't support search using relative properties, see [OAK-1835](https://issues.apache.org/jira/browse/OAK-1835).
-* Solr configuration is mostly done on the Solr side via schema.xml / solrconfig.xml files.
-* Lucene can only be used for full-text queries, Solr can be used for full-text search _and_ for JCR queries involving
+ * Solr index doesn't support search using relative properties, see [OAK-1835](https://issues.apache.org/jira/browse/OAK-1835).
+ * Lucene can only be used for full-text queries, Solr can be used for full-text search _and_ for JCR queries involving
path, property and primary type restrictions.
+As of Oak version 1.2.0:
+
+ * Solr index doesn't support index time aggregation, but only query time aggregation
+ * Lucene and Solr can be both used for full text, property and path restrictions
+