You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Naber <lu...@danielnaber.de> on 2005/07/15 13:50:07 UTC
getting Analyzer's stop words
Hi,
I'd like to add the following extension to the abstract analyzer class:
public abstract Set getStopwords();
This method returns the stop words in use. Subclasses that don't use stop
words at all will have to return an empty HashSet (or null?).
An interesting question is how PerFieldAnalyzerWrapper could implement this
method. I think it should return the union of all its analyzers' stop words.
Regards
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: getting Analyzer's stop words
Posted by Daniel Naber <lu...@danielnaber.de>.
On Friday 15 July 2005 14:33, Erik Hatcher wrote:
> > This method returns the stop words in use. Subclasses that don't
> > use stop
> > words at all will have to return an empty HashSet (or null?).
> >
> > An interesting question is how PerFieldAnalyzerWrapper could
> > implement this
> > method. I think it should return the union of all its analyzers'
> > stop words.
>
> What use case do you have in mind for this feature?
I need to do some complicated query-rewriting with an analyzer that doesn't
change the words but uses the stop words from other analyzers. Well, I've
now locally introduced my own analyzer that extends Analyzer and that
looks like the right solution.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: getting Analyzer's stop words
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 15, 2005, at 7:50 AM, Daniel Naber wrote:
> I'd like to add the following extension to the abstract analyzer
> class:
>
> public abstract Set getStopwords();
>
> This method returns the stop words in use. Subclasses that don't
> use stop
> words at all will have to return an empty HashSet (or null?).
>
> An interesting question is how PerFieldAnalyzerWrapper could
> implement this
> method. I think it should return the union of all its analyzers'
> stop words.
What use case do you have in mind for this feature?
I personally find this an extremely awkward proposal. Stop words may
be field-specific, or may be dynamic. For example, what about a
MinLengthFilter under an analyzer. Would all words that get removed
by an analyzer be considered a "stop word"? The idea of removing
stop words is very questionable, especially in the academic scholarly
domain where I'm applying Lucene. Just the idea of having words
removed from searching causes scholars to scream! :) So I don't see
stop words as a universal analyzer concept at all.
Perhaps there could be a subclass of Analyzer that is designed for
stop word removal and the StopAnalyzer and StandardAnalyzer subclass
from it. If you're handed an Analyzer instance and need to know
whether it removes stop words or not, you could do an "instance of
StopWordRemovalAnalyzer". Perhaps an interface should be used
instead. Either way, I don't see that method being appropriate at
the Analyzer base class level.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org