You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Owen Densmore <ow...@backspaces.net> on 2005/02/03 15:26:30 UTC

Right way to make analyzer

Is this the right way to make a porter analyzer using the standard 
tokenizer?  I'm not sure about the order of the filters.

Owen

     class MyAnalyzer extends Analyzer {
       public TokenStream tokenStream(String fieldName, Reader reader) {
         return new PorterStemFilter(
             new StopFilter(
                 new LowerCaseFilter(
                     new StandardFilter(
                         new StandardTokenizer(reader))),
                StopAnalyzer.ENGLISH_STOP_WORDS));
       }
     }



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Right way to make analyzer

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 3, 2005, at 9:26 AM, Owen Densmore wrote:

> Is this the right way to make a porter analyzer using the standard 
> tokenizer?  I'm not sure about the order of the filters.
>
> Owen
>
>     class MyAnalyzer extends Analyzer {
>       public TokenStream tokenStream(String fieldName, Reader reader) {
>         return new PorterStemFilter(
>             new StopFilter(
>                 new LowerCaseFilter(
>                     new StandardFilter(
>                         new StandardTokenizer(reader))),
>                StopAnalyzer.ENGLISH_STOP_WORDS));
>       }
>     }

Yes, that is correct.

Analysis starts with a tokenizer, and chains the output of that to the 
next filter and so on.

I strongly recommend, as you start tinkering with custom analysis, to 
use a little bit of code to see how your analyzer works on some text.  
The Lucene Intro article I wrote for java.net has some code you can 
borrow to do this, as does Lucene in Action's source code.  Also, Luke 
has this capability - which is a tool I also highly recommend.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org