You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rosen Marinov <ro...@sirma.bg> on 2002/04/19 16:44:57 UTC

Re: Some questions - Analyzer

see Lucene Officila FAQ:

Question 17:

17. Can I write my own custom analyzer ?
Sure. An analyzer is basically a factory object that creates a TokenStream
object used to tokenized the text. A typical analyzer implementation creates
the TokenStream by creating a standard tokenizer and combining it with a
series of filters, each perform a different processing of the token stream.

Here is a sample customized analyzer contributed by Joanne Proton:

public class MyAnalyzer  extends Analyzer
{


  /*
   * An array containing some common words that
   * are not usually useful for searching.
   */
  private static final String[] STOP_WORDS =
  {
     "a"       , "and"     , "are"     , "as"      ,
     "at"      , "be"      , "but"     , "by"      ,
     "for"     , "if"      , "in"      , "into"    ,
     "is"      , "it"      , "no"      , "not"     ,
     "of"      , "on"      , "or"      , "s"       ,
     "such"    , "t"       , "that"    , "the"     ,
     "their"   , "then"    , "there"   , "these"   ,
     "they"    , "this"    , "to"      , "was"     ,
     "will"    ,
     "with"
  };

  /*
   * Stop table
   */
  final static private Hashtable stopTable =
StopFilter.makeStopTable(STOP_WORDS);

  /*
   * Create a token stream for this analyzer.
   */
  public final TokenStream tokenStream(final Reader reader)
  {
    TokenStream result = new StandardTokenizer(reader);

    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopTable);
    result = new PorterStemFilter(result);

    return result;
  }

----- Original Message -----
From: "Marco Ferrante" <fe...@unige.it>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, April 19, 2002 5:24 PM
Subject: Re: Some questions


> > ... snip ...
> > I think that there isn't any Italian Anylizer, is it?
> > How can I write one?
> >
> > ... snip ...
> I'am interesting in this contribute too. Can I help?
>
> --------------------------------------------------
> Marco Ferrante (ferrante@unige.it)
> CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
> Università degli Studi di Genova - Italy
> Via Brigata Salerno, ponte - 16147 Genova
> tel (+39) 0103532621 (interno tel. 2621)
> --------------------------------------------------
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>