You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pooja Verlani <po...@gmail.com> on 2009/09/23 12:15:32 UTC

Phrase stopwords

Hi,
Is it possible to have a phrase as a stopword in solr? In case, please share
how to do so?

regards,
Pooja

Re: Phrase stopwords

Posted by AHMET ARSLAN <io...@yahoo.com>.
> From: Pooja Verlani <po...@gmail.com>
> Subject: Phrase stopwords
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 23, 2009, 1:15 PM
> Hi,
> Is it possible to have a phrase as a stopword in solr? In
> case, please share
> how to do so?
> 
> regards,
> Pooja
> 

I think that can be implemented casting/using SynonymFilterFactory and StopFilterFactory.

<filter class="solr.SynonymFilterFactory synonyms="syn.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>

syn.txt will contain lines:

phrase as a stopword => somestupidtoken
phrase stopword => somestupidtoken
three words stopword => somestupidtoken

stopwords.txt will contain line:
somestupidtoken

IMO it will work since SynonymFilterFactory can handle multi-word synonyms like a b c d => foo. With expand="false", you can use this filter to reduce your multi-word stopwords to a single token (that has a low possibility to occur in your docuements). Then remove this single token with StopFilter.
This combination will remove multi-word entries in your syn.txt.

Hope this helps.