You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by mitu2009 <mu...@gmail.com> on 2009/07/19 07:53:06 UTC

Preserving dots of an acronym while indexing in Lucene

Hi,

If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc),
which analyzer do i need to use and how? I also want to input a set of stop
words to Lucene while doing this.

-- 
View this message in context: http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Preserving dots of an acronym while indexing in Lucene

Posted by Shai Erera <se...@gmail.com>.
I think you should write your own Analyzer and use:
* StandardTokenizer for tokenization and ACRONYM detection.
* StopFilter for stopwrods handling.

The Analyzer you write should override tokenStream() and do something like:

************************************************************
TokenStream result = new StandardTokenizer(reader);
result = new LowerCaseFilter(result); // if lower casing is also what you
want.
result = new StopFilter(result, stopWords);
return result;
************************************************************

StandardAnalyzer wraps StandardTokenizer with StandardFilter, which strips
the acronym off its '.', so you don't want to use it.

Shai

On Sun, Jul 19, 2009 at 8:53 AM, mitu2009 <mu...@gmail.com> wrote:

>
> Hi,
>
> If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc),
> which analyzer do i need to use and how? I also want to input a set of stop
> words to Lucene while doing this.
>
> --
> View this message in context:
> http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>