You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rosen Marinov <ro...@sirma.bg> on 2002/04/19 16:44:57 UTC
Re: Some questions - Analyzer
see Lucene Officila FAQ:
Question 17:
17. Can I write my own custom analyzer ?
Sure. An analyzer is basically a factory object that creates a TokenStream
object used to tokenized the text. A typical analyzer implementation creates
the TokenStream by creating a standard tokenizer and combining it with a
series of filters, each perform a different processing of the token stream.
Here is a sample customized analyzer contributed by Joanne Proton:
public class MyAnalyzer extends Analyzer
{
/*
* An array containing some common words that
* are not usually useful for searching.
*/
private static final String[] STOP_WORDS =
{
"a" , "and" , "are" , "as" ,
"at" , "be" , "but" , "by" ,
"for" , "if" , "in" , "into" ,
"is" , "it" , "no" , "not" ,
"of" , "on" , "or" , "s" ,
"such" , "t" , "that" , "the" ,
"their" , "then" , "there" , "these" ,
"they" , "this" , "to" , "was" ,
"will" ,
"with"
};
/*
* Stop table
*/
final static private Hashtable stopTable =
StopFilter.makeStopTable(STOP_WORDS);
/*
* Create a token stream for this analyzer.
*/
public final TokenStream tokenStream(final Reader reader)
{
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);
result = new PorterStemFilter(result);
return result;
}
----- Original Message -----
From: "Marco Ferrante" <fe...@unige.it>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, April 19, 2002 5:24 PM
Subject: Re: Some questions
> > ... snip ...
> > I think that there isn't any Italian Anylizer, is it?
> > How can I write one?
> >
> > ... snip ...
> I'am interesting in this contribute too. Can I help?
>
> --------------------------------------------------
> Marco Ferrante (ferrante@unige.it)
> CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
> Università degli Studi di Genova - Italy
> Via Brigata Salerno, ponte - 16147 Genova
> tel (+39) 0103532621 (interno tel. 2621)
> --------------------------------------------------
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>