You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by DECAFFMEYER MATHIEU <MA...@fortis.lu> on 2007/03/05 19:25:47 UTC
Using Stemmers
Hi,
This is a very simple question, but I just can't find the ressources I
need ...
I am using the StandardAnalyzer :
StandardAnalyzer stdAnalyzer;
if ((stopWordList != null) && (stopWordList.length != 0)) {
stdAnalyzer = new StandardAnalyzer(stopWordList);
} else {
stdAnalyzer = new StandardAnalyzer();
}
What I want to achive is be able to use an englsih stemmer,
But I can't find any methods to associate my stemmer to my Analayzer.
I appreciate any help, thank u.
__________________________________
Mathieu Decaffmeyer
Web Developer
Fortis Banque Luxembourg
50, avenue J. F. Kennedy
L-2951 Luxembourg
IS Retail Banking - Web Content Management
Mobile : 0032 479 / 69 . 42 . 96
============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not accept legal responsibility for the contents of this message. The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice.
============================================
Re: Using Stemmers
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Mathieu,
You can't add TokenFilters to an existing Analyzer. However,
implementing an Analyzer that acts just like the StandardAnalyzer
plus your Stemmer is pretty straightforward.
StandardAnalzyer.tokenStream() looks like:
/** Constructs a {@link StandardTokenizer} filtered by a {@link
StandardFilter}, a {@link LowerCaseFilter} and a {@link
StopFilter}. */
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
//ADD your Stemming Filter here, or one line above if your Stop word
list works off of stemmed words
return result;
}
So just create a new Analyzer that has these same filters, plus your
stemming TokenFilter. Looking at the source of SnowballAnalyzer
(contrib/snowball) may also be useful.
FWIW, it is not that hard to make a "configurable" analyzer similar
to what Solr does, if you find you need to change the filters in your
analyzer a lot.
Cheers,
Grant
On Mar 5, 2007, at 1:25 PM, DECAFFMEYER MATHIEU wrote:
>
> Hi,
> This is a very simple question, but I just can't find the
> ressources I need ...
> I am using the StandardAnalyzer :
> StandardAnalyzer stdAnalyzer;
> if ((stopWordList != null) && (stopWordList.length != 0)) {
> stdAnalyzer = new StandardAnalyzer(stopWordList);
> } else {
> stdAnalyzer = new StandardAnalyzer();
> }
> What I want to achive is be able to use an englsih stemmer,
> But I can't find any methods to associate my stemmer to my Analayzer.
> I appreciate any help, thank u.
>
> __________________________________
>
> Mathieu Decaffmeyer
> Web Developer
> Fortis Banque Luxembourg
> 50, avenue J. F. Kennedy
> L-2951 Luxembourg
> IS Retail Banking - Web Content Management
> Mobile : 0032 479 / 69 . 42 . 96
>
>
>
> ============================================
> Internet communications are not secure and therefore Fortis Banque
> Luxembourg S.A. does not accept legal responsibility for the
> contents of this message. The information contained in this e-mail
> is confidential and may be legally privileged. It is intended
> solely for the addressee. If you are not the intended recipient,
> any disclosure, copying, distribution or any action taken or
> omitted to be taken in reliance on it, is prohibited and may be
> unlawful. Nothing in the message is capable or intended to create
> any legally binding obligations on either party and it is not
> intended to provide legal advice.
> ============================================
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ