You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by mitu2009 <mu...@gmail.com> on 2009/07/19 07:53:06 UTC
Preserving dots of an acronym while indexing in Lucene
Hi,
If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc),
which analyzer do i need to use and how? I also want to input a set of stop
words to Lucene while doing this.
--
View this message in context: http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Preserving dots of an acronym while indexing in Lucene
Posted by Shai Erera <se...@gmail.com>.
I think you should write your own Analyzer and use:
* StandardTokenizer for tokenization and ACRONYM detection.
* StopFilter for stopwrods handling.
The Analyzer you write should override tokenStream() and do something like:
************************************************************
TokenStream result = new StandardTokenizer(reader);
result = new LowerCaseFilter(result); // if lower casing is also what you
want.
result = new StopFilter(result, stopWords);
return result;
************************************************************
StandardAnalyzer wraps StandardTokenizer with StandardFilter, which strips
the acronym off its '.', so you don't want to use it.
Shai
On Sun, Jul 19, 2009 at 8:53 AM, mitu2009 <mu...@gmail.com> wrote:
>
> Hi,
>
> If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc),
> which analyzer do i need to use and how? I also want to input a set of stop
> words to Lucene while doing this.
>
> --
> View this message in context:
> http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>