You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Samir Satam <ss...@copyright.com> on 2002/07/30 16:57:08 UTC

Using Different Stop Analyzers...

Hi All,

I have a question about Analyzers. Now I know that the documentation states that I need to use the same analyzers for both Indexing and Searching. But this question is only about the Stopanalyzer.

Lets say I construct a StopAnalyzer with certain set of stopwords (stopwords(x)) while indexing. My guess is that Lucene will probably read those stopwords and remove them from all my indexed fields. Thus if I search on those stopwords I will not be able to find them. 

Can I pass a different set of stopwords (stopwords(y)) possibly a superset of stopwords(x)? 

And in this case will lucene think of stop words as 
totalstopwordset = stopwords(x) (these are already removed from the index) + stopwords(y) (these will be removed while searching)

Can somebody please let me know if my thinking is correct on this one?

The advantage of passing a different set while searching is that I can eliminate stopwords without re-indexing all my documents.

thank you in advance,
Samir 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

RE: Using Different Stop Analyzers...

Posted by "Nader S. Henein" <ns...@bayt.net>.

Just have two different analyzers with two sets of stop words, both
inheriting from the same analyzer (for you other analyzer criteria such as
whitespace analysis), this would work but why would you want to do that

-----Original Message-----
From: Samir Satam [mailto:ssatam@copyright.com]
Sent: Tuesday, July 30, 2002 6:57 PM
To: Lucene Users List
Subject: Using Different Stop Analyzers...

Hi All,

I have a question about Analyzers. Now I know that the documentation states
that I need to use the same analyzers for both Indexing and Searching. But
this question is only about the Stopanalyzer.

Lets say I construct a StopAnalyzer with certain set of stopwords
(stopwords(x)) while indexing. My guess is that Lucene will probably read
those stopwords and remove them from all my indexed fields. Thus if I search
on those stopwords I will not be able to find them.

Can I pass a different set of stopwords (stopwords(y)) possibly a superset
of stopwords(x)?

And in this case will lucene think of stop words as
totalstopwordset = stopwords(x) (these are already removed from the index) +
stopwords(y) (these will be removed while searching)

Can somebody please let me know if my thinking is correct on this one?

The advantage of passing a different set while searching is that I can
eliminate stopwords without re-indexing all my documents.

thank you in advance,
Samir

--
To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail:
<ma...@jakarta.apache.org>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>