You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pooja Verlani <po...@gmail.com> on 2009/09/23 12:15:32 UTC
Phrase stopwords
Hi,
Is it possible to have a phrase as a stopword in solr? In case, please share
how to do so?
regards,
Pooja
Re: Phrase stopwords
Posted by AHMET ARSLAN <io...@yahoo.com>.
> From: Pooja Verlani <po...@gmail.com>
> Subject: Phrase stopwords
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 23, 2009, 1:15 PM
> Hi,
> Is it possible to have a phrase as a stopword in solr? In
> case, please share
> how to do so?
>
> regards,
> Pooja
>
I think that can be implemented casting/using SynonymFilterFactory and StopFilterFactory.
<filter class="solr.SynonymFilterFactory synonyms="syn.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
syn.txt will contain lines:
phrase as a stopword => somestupidtoken
phrase stopword => somestupidtoken
three words stopword => somestupidtoken
stopwords.txt will contain line:
somestupidtoken
IMO it will work since SynonymFilterFactory can handle multi-word synonyms like a b c d => foo. With expand="false", you can use this filter to reduce your multi-word stopwords to a single token (that has a low possibility to occur in your docuements). Then remove this single token with StopFilter.
This combination will remove multi-word entries in your syn.txt.
Hope this helps.