You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitry Baranov <sa...@mail.ru> on 2013/04/24 15:43:56 UTC

solr.StopFilterFactory doesn't work with wildcard

Good day!

I have a problem with the solr.StopFilterFactory and wildcard text search.
For query like this 'hp* pavilion* series* d4*', where 'series' is stop
word, I recieve error:
'analyzer returned no terms for multiTerm term: series'
But for query like this 'hp* pavilion* series d4*', I recieve expected
results.

Could you help me?

I have field type for search as below:

<fieldType name="search_string" class="solr.TextField"
positionIncrementGap="100">
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
	</analyzer>
	<analyzer type="multiterm">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
	</analyzer>
</fieldType>

Solr version:

solr-spec	4.0.0.2012.10.06.03.04.33
solr-impl	4.0.0 1394950 - rmuir - 2012-10-06 03:04:33
lucene-spec	4.0.0
lucene-impl	4.0.0 1394950 - rmuir - 2012-10-06 03:00:40



--
View this message in context: http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr.StopFilterFactory doesn't work with wildcard

Posted by Dmitry Baranov <sa...@mail.ru>.
1) I use StopFilterFactory in "multiterm" analyzer because without it "query"
analizer doesn't work with multi-terms, in particular terms with wildcard.
2) I expect that:
<str name="rawquerystring">search_string_ss_i:(hp* pavilion* series*
d4*)</str>
<str name="querystring">search_string_ss_i:(hp* pavilion* series* d4*)</str>
<str name="parsedquery">search_string_ss_i:hp* +search_string_ss_i:pavilion*
+search_string_ss_i:d4*</str>
<str name="parsedquery_toString">+search_string_ss_i:hp*
+search_string_ss_i:pavilion* +search_string_ss_i:d4*</str>
i.e. I expect that StopFilterFactory will work likewise query without
wildcard



Thanks for you answer



--
View this message in context: http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581p4058856.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr.StopFilterFactory doesn't work with wildcard

Posted by Chris Hostetter <ho...@fucit.org>.
: In any case, technically, the stop filter is doing exactly what it is supposed
: to do.

Jack has kind of glossed over some key questions here...

1) why are you using StopFilterFactory in your "multiterm" analyzer like 
this?
2) what do you expect it to do if "series" is in your stopwords and 
someone queries for "series*"

: <fieldType name="search_string" class="solr.TextField"
: positionIncrementGap="100">
: <analyzer type="query">
: <tokenizer class="solr.WhitespaceTokenizerFactory" />
: <filter class="solr.StopFilterFactory" words="stopwords.txt"
: ignoreCase="true"/>
: </analyzer>
: <analyzer type="multiterm">
: <tokenizer class="solr.WhitespaceTokenizerFactory" />
: <filter class="solr.StopFilterFactory" words="stopwords.txt"
: ignoreCase="true"/>
: </analyzer>
: </fieldType>


-Hoss

Re: solr.StopFilterFactory doesn't work with wildcard

Posted by Jack Krupansky <ja...@basetechnology.com>.
Well, what is happening is that the query parser detects a "prefix query" 
("series*") and then does a term analysis on the prefix alone ("series"), 
which you probably have in your stop words list, which causes the analyzer 
to return... nothing, which is what the error is complaining about.

You can workaround my querying for serie* (as long as "serie" is not also a 
stop word.

In any case, technically, the stop filter is doing exactly what it is 
supposed to do.

In all honesty, I can't imagine a context in which a noun such as "series" 
would be on a stop word list. What's your thinking on why it is there??

-- Jack Krupansky

-----Original Message----- 
From: Dmitry Baranov
Sent: Wednesday, April 24, 2013 9:43 AM
To: solr-user@lucene.apache.org
Subject: solr.StopFilterFactory doesn't work with wildcard

Good day!

I have a problem with the solr.StopFilterFactory and wildcard text search.
For query like this 'hp* pavilion* series* d4*', where 'series' is stop
word, I recieve error:
'analyzer returned no terms for multiTerm term: series'
But for query like this 'hp* pavilion* series d4*', I recieve expected
results.

Could you help me?

I have field type for search as below:

<fieldType name="search_string" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
</analyzer>
<analyzer type="multiterm">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
</analyzer>
</fieldType>

Solr version:

solr-spec 4.0.0.2012.10.06.03.04.33
solr-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:04:33
lucene-spec 4.0.0
lucene-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:00:40



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html
Sent from the Solr - User mailing list archive at Nabble.com.