You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by alendo <a....@etcware.it> on 2010/11/16 10:52:08 UTC

stopwords file configuration

I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR
1.4).
I would like to use stopwords, and I installed in
LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the
file.
Moreover the field where I want to clean stopwords is declared in schema.xml
as 

		<field name="content_title" type="textgen" indexed="true" stored="true"/>

where textgen is this
		<fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
			<analyzer type="index">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
				<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
				<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>

But if I index a document with 'stopworda' and 'stopwordb' that are the test
stopword to verify that it works it doesn't work because I find these words
inside the content_title field. Do I need to declare elsewhere that I'm
using stopwords.txt file? Do you have any suggestion?
thanks
Ale
-- 
View this message in context: http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910032.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: stopwords file configuration

Posted by alendo <a....@etcware.it>.

I reply to myself because I founded the mistake. The italian stopwords file
that I founded on apache site contains  on the same line of each stopword a
comment shell style, the stopwords tokenizer probably is basical and doesn't
accept comments on the same line of stopwords. I dropped them and now it
works. Anyway the stopwords are stored but not founded.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910309.html
Sent from the Solr - User mailing list archive at Nabble.com.