You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Ard Schrijvers (JIRA)" <ji...@apache.org> on 2007/09/03 09:19:19 UTC
[jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

    [ https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524452 ] 

Ard Schrijvers commented on JCR-1079:
-------------------------------------

>no, I think it's better to have it symmetric. If they can only be used globally then you should only be allowed to configure them globally.

Also IMO this is best, because it might be very confusing. The reason why it can be only globally configured is in the tokenStream part of the analyzer:

public TokenStream tokenStream(String fieldName, Reader reader) {

This one is used for indexing *and* parsing for searching, and the only thing I can distinguish on is the String fieldName (string representation of the QName). There is no way to know which indexing-rule it holds for, hence, the global configuration. 



> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
>                 Key: JCR-1079
>                 URL: https://issues.apache.org/jira/browse/JCR-1079
>             Project: Jackrabbit
>          Issue Type: New Feature
>    Affects Versions: 1.3.1
>            Reporter: Ard Schrijvers
>            Priority: Minor
>             Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be configurable. In each <index-rule> to a property an analyzer can be assigned. This means, that property will be analyzed with that specific analyzer. In the first place, it enables multilingual indexing. 
> Documentation needs to be added explaining the difference in searching in the node scope [jcr:contains(.,'foo')] and in some property [jcr:contains(@myprop,'foo')]. The node scope will always be searched and indexed with the default analyzer, which can be configured in the workspace.xml in  the  <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the possible enhancement (not sure wether this implementation will have it, because it requires a lot of filter Factories and is probably out of scope). Adding custom filters which do not need a factory might be easier.
> <analyzers>
> 	<analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
> 	<analyzer name="de" class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>         <analyzer name="compound" class="org.apache.lucene.analysis.SimpleAnalyzer">
>              <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
>              <filter class="jr.EdgeNGramTokenizerFactory" side="front" minGram="1" maxGram="2"/>
>         </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
>        <property analyzer="fr">bode_fr</property>
>        <property analyzer="de">bode_de</property>
> </index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.