You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Ard Schrijvers (JIRA)" <ji...@apache.org> on 2007/09/04 11:01:01 UTC
[jira] Updated: (JCR-1079) Extend the IndexingConfiguration to
allow configuration of reuseable analyzers
[ https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ard Schrijvers updated JCR-1079:
--------------------------------
Attachment: JCR-1079.patch
patch against jackrabbit-core rev 571494.
To use the configurable analyzer per property:
in workspace.xml in <SearchIndex> add
<param name="indexingConfiguration" value="applications\indexing_configuration.xml"/>
And in indexing_configuration, add something like:
<analyzers>
<analyzer class="org.apache.lucene.analysis.StopAnalyzer">
<property>mytext</property>
</analyzer>
<analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
<property>mytext2</property>
</analyzer>
</analyzers>
if you want to use org.apache.lucene.analysis.fr.FrenchAnalyzer or org.apache.lucene.analysis.de.GermanAnalyzer, etc, make sure you add the lucene-analyzers.jar
> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
> Key: JCR-1079
> URL: https://issues.apache.org/jira/browse/JCR-1079
> Project: Jackrabbit
> Issue Type: New Feature
> Affects Versions: 1.3.1
> Reporter: Ard Schrijvers
> Priority: Minor
> Fix For: 1.4
>
> Attachments: JCR-1079.patch
>
>
> To the indexing_configuration.xml a xml block of analyzers should be configurable. In each <index-rule> to a property an analyzer can be assigned. This means, that property will be analyzed with that specific analyzer. In the first place, it enables multilingual indexing.
> Documentation needs to be added explaining the difference in searching in the node scope [jcr:contains(.,'foo')] and in some property [jcr:contains(@myprop,'foo')]. The node scope will always be searched and indexed with the default analyzer, which can be configured in the workspace.xml in the <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the possible enhancement (not sure wether this implementation will have it, because it requires a lot of filter Factories and is probably out of scope). Adding custom filters which do not need a factory might be easier.
> <analyzers>
> <analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
> <analyzer name="de" class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
> <analyzer name="compound" class="org.apache.lucene.analysis.SimpleAnalyzer">
> <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
> <filter class="jr.EdgeNGramTokenizerFactory" side="front" minGram="1" maxGram="2"/>
> </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
> <property analyzer="fr">bode_fr</property>
> <property analyzer="de">bode_de</property>
> </index-rule>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.