You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2007/09/10 10:42:02 UTC

[Jackrabbit Wiki] Update of "IndexingConfiguration" by ardschrijvers

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by ardschrijvers:
http://wiki.apache.org/jackrabbit/IndexingConfiguration

------------------------------------------------------------------------------
  </configuration>
  }}}
  
+ === Index Analyzers ===
+ 
+ With this configuration part, you define how a property should be analysed. If a property has an analyzer configured, this analyzer is used for indexing and searching this property. For example:
+ 
+ {{{
+ <?xml version="1.0"?>
+ <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
+ <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+   <analyzers> 
+         <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
+             <property>mytext</property>
+         </analyzer>
+         <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
+             <property>mytext2</property>
+         </analyzer>
+   </analyzers> 
+ </configuration>
+ }}}
+ 
+ The configuration above means that the property "mytext" for the entire workspace is indexed (ans searched) with the lucene KeywordAnalyzer, and property "mytext2" with WhitespaceAnalyzer. Using different analyzers for different languages is specifically useful.
+ 
+ Though, when using analyzers, you may find unexpected behavior when searching within a property compared to searching within a node scope: 
+ When your query is for example:
+ 
+ {{{
+ xpath = "//*[jcr:contains(mytext,'analyzer')]"
+ }}}
+ 
+ and the property "mytext" contained the text : "testing my analyzers". 
+ 
+ Now, when not having configured any analyzers for the property "mytext", this xpath does not return a hit in the node with the property above. Also xpath = "//*[jcr:contains(.,'analyzer')]", won't give a hit. Realize, that you can only set specific analyzers on a node property, and that the node scope indexing/analyzing always is done with the globally defined analyzer in SearchIndex element. Now, when I would change the analyzer used to indexed the "mytext" property above to 
+ 
+ {{{
+ <analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
+      <property>mytext</property>
+ </analyzer>
+ }}}
+ 
+ and I would do the same search again, then for {{{xpath = "//*[jcr:contains(mytext,'analyzer')]"}}} I would find a hit because of stemming! The other search, {{{xpath = "//*[jcr:contains(.,'analyzer')]"}}} still would not give a result, since the node scope is indexed with the global analyzer, which in this case did not do stemming. 
+ 
+ So, realize that when using analyzers for specific properties, you might find a hit in a property for some search text, and you do not find a hit with the same search text in the node scope of the property!
+ 
+ 
  '''Important note''': Both index rules and index aggregates influence how content is indexed in Jackrabbit. If you change the configuration the existing content is not automatically re-indexed according to the new rules. You therefore have to manually re-index the content when you change the configuration!