You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by DECAFFMEYER MATHIEU <MA...@fortis.lu> on 2007/03/05 19:25:47 UTC

Using Stemmers

Hi, 
This is a very simple question, but I just can't find the ressources I
need ... 
I am using the StandardAnalyzer : 
StandardAnalyzer stdAnalyzer; 
if ((stopWordList != null) && (stopWordList.length != 0)) { 
stdAnalyzer = new StandardAnalyzer(stopWordList); 
} else { 
stdAnalyzer = new StandardAnalyzer(); 
} 
What I want to achive is be able to use an englsih stemmer, 
But I can't find any methods to associate my stemmer to my Analayzer. 
I appreciate any help, thank u. 

__________________________________

   Mathieu Decaffmeyer
   Web Developer
   Fortis Banque Luxembourg
   50, avenue J. F. Kennedy
   L-2951 Luxembourg
   IS Retail Banking - Web Content Management
   Mobile : 0032  479 / 69 . 42 . 96



============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not accept legal responsibility for the contents of this message. The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice.
============================================

Re: Using Stemmers

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Mathieu,

You can't add TokenFilters to an existing Analyzer.  However,  
implementing an Analyzer that acts just like the StandardAnalyzer  
plus your Stemmer is pretty straightforward.   
StandardAnalzyer.tokenStream() looks like:
/** Constructs a {@link StandardTokenizer} filtered by a {@link
   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
StopFilter}. */
   public TokenStream tokenStream(String fieldName, Reader reader) {
     TokenStream result = new StandardTokenizer(reader);
     result = new StandardFilter(result);
     result = new LowerCaseFilter(result);
     result = new StopFilter(result, stopSet);
//ADD your Stemming Filter here, or one line above if your Stop word  
list works off of stemmed words
     return result;
   }

So just create a new Analyzer that has these same filters, plus your  
stemming TokenFilter.  Looking at the source of SnowballAnalyzer  
(contrib/snowball) may also be useful.

FWIW, it is not that hard to make a "configurable" analyzer similar  
to what Solr does, if you find you need to change the filters in your  
analyzer a lot.

Cheers,
Grant


On Mar 5, 2007, at 1:25 PM, DECAFFMEYER MATHIEU wrote:

>
> Hi,
> This is a very simple question, but I just can't find the  
> ressources I need ...
> I am using the StandardAnalyzer :
> StandardAnalyzer stdAnalyzer;
> if ((stopWordList != null) && (stopWordList.length != 0)) {
> stdAnalyzer = new StandardAnalyzer(stopWordList);
> } else {
> stdAnalyzer = new StandardAnalyzer();
> }
> What I want to achive is be able to use an englsih stemmer,
> But I can't find any methods to associate my stemmer to my Analayzer.
> I appreciate any help, thank u.
>
> __________________________________
>
>    Mathieu Decaffmeyer
>    Web Developer
>    Fortis Banque Luxembourg
>    50, avenue J. F. Kennedy
>    L-2951 Luxembourg
>    IS Retail Banking - Web Content Management
>    Mobile : 0032  479 / 69 . 42 . 96
>
>
>
> ============================================
> Internet communications are not secure and therefore Fortis Banque  
> Luxembourg S.A. does not accept legal responsibility for the  
> contents of this message. The information contained in this e-mail  
> is confidential and may be legally privileged. It is intended  
> solely for the addressee. If you are not the intended recipient,  
> any disclosure, copying, distribution or any action taken or  
> omitted to be taken in reliance on it, is prohibited and may be  
> unlawful. Nothing in the message is capable or intended to create  
> any legally binding obligations on either party and it is not  
> intended to provide legal advice.
> ============================================
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ