You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Nick Snels <ni...@gmail.com> on 2006/06/22 20:01:18 UTC

Dutch analyzer in combo with custom stopword list

Hi,

I have replaced the English stopwords with Dutch stopwords. And I also
managed to get the dutch analyzer to work, without throwing an error. The
following works:

    <fieldtype name="nametext" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
    </fieldtype>

But why doesn't the following work

    <fieldtype name="nametext" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
        <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
      </analyzer>
    </fieldtype>

The problem is that the Dutch Analyzer doesn't filter all the stopwords, so
I made an extended one. But the above configuration doesn't work. How can I
make it work. Hope somebody can help me out.

Kind regards,

Nick

Re: Dutch analyzer in combo with custom stopword list

Posted by Nick Snels <ni...@gmail.com>.

Hi Yonik,

thanks for the advice. The factory works like a charm!!

Kind regards,

Nick

On 6/22/06, Yonik Seeley <ys...@gmail.com> wrote:
>
> On 6/22/06, Nick Snels <ni...@gmail.com> wrote:
> > But why doesn't the following work
> >
> >     <fieldtype name="nametext" class="solr.TextField">
> >       <analyzer>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
> >         <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
> >       </analyzer>
> >     </fieldtype>
>
> An analyzer *is* a tokenizer followed by multiple token filters, so
> you can't really put an analyzer in another analyzer.
>
> Probably the right way to handle this is to make a Factory for the
> stemmer filter only.
> One was is by enhancing the existing SnowballPorterFilterFactory in
> Solr to make the language configurable, and to allow it to take an
> exclusion or protected words list.
>
> -Yonik
>

Re: Dutch analyzer in combo with custom stopword list

Posted by Yonik Seeley <ys...@gmail.com>.

On 6/22/06, Nick Snels <ni...@gmail.com> wrote:
> But why doesn't the following work
>
>     <fieldtype name="nametext" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
>         <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
>       </analyzer>
>     </fieldtype>

An analyzer *is* a tokenizer followed by multiple token filters, so
you can't really put an analyzer in another analyzer.

Probably the right way to handle this is to make a Factory for the
stemmer filter only.
One was is by enhancing the existing SnowballPorterFilterFactory in
Solr to make the language configurable, and to allow it to take an
exclusion or protected words list.

-Yonik