You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by to...@apache.org on 2015/06/26 17:15:05 UTC

svn commit: r1687784 - in /jackrabbit/oak/trunk/oak-doc/src/site/markdown: osgi_config.md query/solr.md

Author: tommaso
Date: Fri Jun 26 15:15:04 2015
New Revision: 1687784

URL: http://svn.apache.org/r1687784
Log:
OAK-1695 - updated Solr doc

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md?rev=1687784&r1=1687783&r2=1687784&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md Fri Jun 26 15:15:04 2015
@@ -269,80 +269,6 @@ in both config file and framework proper
 For example by default Sling sets **repository.home** to _${sling.home}/repository_. So this value
 need not be specified in config files
 
-### Solr Server Configuration
-Solr index requires some configuration to be properly used, in OSGi environments such configurations can be performed 
-via OSGi Configuration Admin.
-
-The following configuration items can be defined (e.g. through Apache Felix WebConsole).
-
-1. Oak Solr indexing / search configuration: Configuration for _OakSolrConfigurationProvider_ service with the following parameters:
-
-        #field for searching for nodes having a certain exact path
-        path.exact.field = path_exact
-        
-        #field for searching for nodes descendants of a node with a certain path
-        path.desc.field = path_des
-        
-        #field for searching for nodes children of a node with a certain path
-        path.child.field = path_child
-        
-        #field for searching for nodes parents of a node with a certain path
-        path.parent.field = path_anc
-        
-        #field to be used for searching when no field is defined in the search query (e.g. user entered queries like 'foo bar')
-        catch.all.field = catch_all
-        
-        #number of documents per 'page' to be fetched for each query 
-        rows = 100000
-        
-        #Solr commit policy to be used when indexing nodes as documents in Solr 
-        commit.policy = SOFT
-        
-        #wether the Solr index should be used also for filtering nodes by path restrictions 
-        path.restrictions = false
-        
-        #wether the Solr index should be used also for filtering nodes by property restrictions 
-        property.restrictions = false
-        
-        #wether the Solr index should be used also for filtering nodes by primary type 
-        primarytypes.restrictions = false
-
-2. Oak Solr remote server configuration: Configuration for _RemoteSolrServerProvider_ service with the following parameters:
-       
-        #URL to connect to a single remote Solr instance, including the core name (e.g. http://10.10.1.107:8983/solr/oak)
-        solr.http.url = 
-        
-        #Zookeeper host to connect to when using SolrCloud clusters (e.g. 10.10.1.102:9983)
-        solr.zk.host =
-        
-        #name ot the Solr collection to use when connecting to a SolrCloud cluster
-        solr.collection = oak 
-        
-        #number of shards to be used for the collection to be used with SolrCloud
-        solr.shards.no = 2
-        
-        #Solr replication factor, no. of replicas to be created for each shard (for each collection) with SolrCloud 
-        solr.replication.factor = 2
-        
-        #directory eventually containing the configuration files to be uploaded for creating the SolrCloud collection 
-        solr.conf.dir =  
-        
-3. Oak Solr embedded server configuration: Configuration for _EmbeddedSolrServerProvider_ service with the following parameters:
-
-        #path to the Solr home directory to be used for starting the EmbeddedSolrServer (can be absolute or relative)
-        solr.home.path = solr 
-        
-        #name of the Solr core to be created within the EmbeddedSolrServer
-        solr.core.name = oak
-        
-        #path to the cores config file to be used for starting Solr
-        solr.config.path = solr.xml 
-
-4. Oak Solr server provider: Configuration for _SolrServerProvider_ service with the following parameters:
-
-        #type of Solr server provider to be used, supported types are none, remote (RemoteSolrServerProvider) and embedded (EmbeddedSolrServerProvider)
-        server.type = none
-
 [1]: http://docs.mongodb.org/manual/reference/connection-string/
 [2]: http://jackrabbit.apache.org/api/2.4/org/apache/jackrabbit/core/data/FileDataStore.html
 [OAK-1645]: https://issues.apache.org/jira/browse/OAK-1645

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md?rev=1687784&r1=1687783&r2=1687784&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md Fri Jun 26 15:15:04 2015
@@ -15,7 +15,7 @@
    limitations under the License.
   -->
   
-### Solr Index
+## Solr Index
 
 The Solr index is mainly meant for full-text search (the 'contains' type of queries):
 
@@ -29,10 +29,6 @@ Even if it's not just a full-text index,
 because, in most production scenarios, it'll be a 'remote' index and therefore network latency / errors would 
 have less impact on the repository performance.
 
-TODO Node aggregation.
-
-##### Index definition for Solr index
-<a name="solr-index-definition"></a>
 The index definition node for a Solr-based index:
 
  * must be of type `oak:QueryIndexDefinition`
@@ -53,25 +49,138 @@ Example:
         .setProperty("async", "async")
         .setProperty("reindex", true);
     }
+        
+The Oak Solr index creates one document in the Solr index for each node in the repository, each of such documents has 
+usually at least a field for each property associated with the related node.
+Indexing of properties can be done by name: e.g. property 'jcr:title' of a node is written into a field 'jcr:title' of 
+the corresponding Solr document in the index, or by type: e.g. properties 'jcr:data' and 'binary_content' of type 
+_binary_ are written into a field 'binary_data' that's responsible for the indexing of all fields having that type and 
+thus properly configured for hosting such type of data.
     
-#### Configuring the Solr index
+### Configuring the Solr index
 
-Besides the mandatory index definition parameters (and `reindex`), a number of additional parameters can be defined in 
+Besides the index definition parameters mentioned above, a number of additional parameters can be defined in 
  Oak Solr index configuration.
 Such a configuration is composed by:
 
- - the Solr server configuration (see [SolrServerConfiguration](http://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/index/solr/configuration/SolrServerConfiguration.html))
  - the search / indexing configuration (see [OakSolrConfiguration](http://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/index/solr/configuration/OakSolrConfiguration.html))
+ - the Solr server configuration (see [SolrServerConfiguration](http://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/index/solr/configuration/SolrServerConfiguration.html))
  
-##### Solr server configuration options
+### Search / indexing configuration options
 
-TBD
+Such options define how Oak handles search and indexing requests in order to properly delegate such operations to Solr.
+
+#### Use for property restrictions
+
+If set to _true_ the Solr index will be used also for filtering nodes by property restrictions.
+
+Default is 'false'.
+
+#### Use for path restrictions
+
+If set to _true_ the Solr index will be used also for filtering nodes by path restrictions.
+
+Default is 'false'.
+
+#### Use for primary types
+
+If set to _true_ the Solr index will be used also for filtering nodes by primary type.
+
+Default is 'false'.
+
+#### Path field
+
+The name of the field to be used for searching an exact match of a certain path.
+
+Default is 'path_exact'.
+
+#### Catch all field
+
+The name of the field to be used for searching when no specific field is defined in the search query (e.g. user entered 
+queries like 'foo bar').
+
+Default is 'catch_all'.
+
+Default Solr schema.xml provided with Oak Solr index contains a copyField from everything to 'catch_all', that causing
+all the properties of a certain node to be indexed into that field (as separate values) therefore a query run against 
+that field would match if any of the properties of the original node would have matched such a query.
+
+#### Descendant path field
+
+The name of the field to be used for searching for nodes descendants of a certain node.
+
+Default is 'path_des'.
+
+E.g. The Solr query to find all the descendant nodes of /a/b would be 'path_des:\/a\/b'.
+
+#### Children path field
+
+The name of the field to be used for searching for child nodes of a certain node.
+
+Default is 'path_child'.
+
+E.g. The Solr query to find all the child nodes of /a/b would be 'path_child:\/a\/b'.
 
-##### Search / indexing configuration options
+#### Parent path field
+
+The name of the field to be used for searching for parent node of a certain node.
+
+Default is 'path_anc'.
+
+E.g. The Solr query to find the parent node of /a/b would be 'path_anc:\/a\/b'.
+
+#### Property restriction fields
+
+The (optional) mapping of property names into Solr fields, so that a mapping jcr:title=foo is defined each node having 
+ the property jcr:title will have its correspondant Solr document having a property foo indexed with the value of the 
+ jcr:title property.
+
+Default is no mapping, therefore the default mechanism of mapping property names to field names is performed.
+
+#### Used properties
+
+A whitelist of properties to be used for indexing / searching by Solr index.
+Such a whitelist, if not empty, would dominate whatever configuration defined for the [Ignored_properties](#Ignored_properties).
+
+Default is an empty list.
+
+E.g. If such a whitelist contains properties _jcr:title_ and _text_ the Solr index will only index such properties for each
+node and will be possible to use it for searching only on those two properties.
+
+#### Ignored properties
+A blacklist of properties to be ignored while indexing and searching by the Solr index.
+
+Such a blacklist makes sense (it will be taken into account by the Solr index) only if the [Used properties](#Used_properties)
+ option doesn't have any value.
+
+Default is the following array: _("rep:members", "rep:authorizableId", "jcr:uuid", "rep:principalName", "rep:password"}_.
+
+#### Commit policy
+
+The Solr commit policy to be used when indexing nodes as documents in Solr.
+
+Possible values are 'SOFT', 'HARD', 'AUTO'.
+
+SOFT: perform a Solr soft-commit for each indexed document.
+
+HARD: perform a Solr (hard) commit for each indexed document.
+
+AUTO: doesn't perform any commit and relies on auto commit being configured on plain Solr's configuration (solrconfig.xml).
+
+Default is _SOFT_.
+
+#### Rows
+
+The number of documents per 'page' to be fetched for each query.
+
+Default is _Integer.MAX_VALUE_ (was _50_ in Oak 1.0).
+
+##### Solr server configuration options
 
 TBD
     
 #### Setting up the Solr server
+
 For the Solr index to work Oak needs to be able to communicate with a Solr instance / cluster.
 Apache Solr supports multiple deployment architectures: 
 
@@ -104,10 +213,9 @@ Oak will communicate to such a Solr serv
 Configuring a single remote Solr instance consists of providing the URL to connect to in order to reach the [Solr core]
 (https://wiki.apache.org/solr/SolrTerminology) that will host the Solr index for the Oak repository via the _solr.http.url_
  property which will have to contain such a URL (e.g. _http://10.10.1.101:8983/solr/oak_). 
-All the configuration and tuning of Solr, other than what's described in 'Solr Server Configuration' section of the [OSGi 
-configuration](osgi_config.html) page, will have to be performed on the Solr side; [sample Solr configuration]
- (http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-solr-core/src/main/resources/solr/) files (schema.xml, 
- solrconfig.xml, etc.) to start with can be found in _oak-solr-core_ artifact.
+All the configuration and tuning of Solr, other than what's described on this page, will have to be performed on the 
+Solr side; [sample Solr configuration](http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-solr-core/src/main/resources/solr/) 
+files (schema.xml, solrconfig.xml, etc.) to start with can be found in _oak-solr-core_ artifact.
 
 ###### SolrCloud cluster
 A [SolrCloud](https://cwiki.apache.org/confluence/display/solr/SolrCloud) cluster is the recommended setup for an Oak 
@@ -117,18 +225,14 @@ to be provided in the _solr.zk.host_ pro
 directly with Zookeeper.
 The [Solr collection](https://wiki.apache.org/solr/SolrTerminology) to be used within Oak is named _oak_, having a replication
  factor of 2 and using 2 shards; this means in the default setup the SolrCloud cluster would have to be composed by at 
- least 4 Solr servers as the index will be split into 2 shards and each shard will have 2 replicas. Such parameters can 
- be changed, look for the 'Oak Solr remote server configuration' item on the [OSGi configuration](osgi_config.html) page.
+ least 4 Solr servers as the index will be split into 2 shards and each shard will have 2 replicas.
 SolrCloud also allows the hot deploy of configuration files to be used for a certain collection so while setting up the 
  collection to be used for Oak with the needed files before starting the cluster, configuration files can also be uploaded 
  from a local directory, this is controlled by the _solr.conf.dir_ property of the 'Oak Solr remote server configuration'.
 For a detailed description of how SolrCloud works see the [Solr reference guide](https://cwiki.apache.org/confluence/display/solr/SolrCloud).
 
 ##### OSGi environment
-All the Solr configuration parameters are described in the 'Solr Server Configuration' section on the 
-[OSGi configuration](osgi_config.html) page.
-
-Create an index definition for the Solr index, as described [above](#solr-index-definition).
+Create an index definition for the Solr index, as described [above](#Solr_index).
 Once the query index definition node has been created, access OSGi ConfigurationAdmin via e.g. Apache Felix WebConsole:
 
  1. find the 'Oak Solr indexing / search configuration' item and eventually change configuration properties as needed
@@ -178,5 +282,4 @@ path, property and primary type restrict
 As of Oak version 1.2.0:
 
  * Solr index doesn't support index time aggregation, but only query time aggregation
- * Lucene and Solr can be both used for full text, property and path restrictions
-
+ * Lucene and Solr can be both used for full text, property and path restrictions
\ No newline at end of file