You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "easy.angel" <oi...@gmail.com> on 2010/07/01 16:41:03 UTC

How to force wildcard query not to ignore word endings

Hello,
 
I have one problem with querying solr. I indexed person with 2 fields:
 
 * firstname - Hans
 * lastname - Mustermann
 
and I have copy field 'text' where these fields are copied. 'text' field is
used during query.
 
Now, when I search:
 
han*
 
I do have Hans Mustermann in the query results. But if I will search:
 
hans*
 
I recieve no results! However query without wildcard will return correct
results.
 
How can I configure solr to return Hans Mustermann for query: hans* ?
 
I add this wildcard dynamically (during query), so I want to have it in
every query.

Thanks in advance,
Oleg

-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936322.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to force wildcard query not to ignore word endings

Posted by "easy.angel" <oi...@gmail.com>.

Thanks! I tested it and it works perfectly.

> However remember that wildcard, prefix searches (*) are not analyzed. 
> For example HAN* won't return anything. 

I making query lowercasing also dynamically, so it's not a problem for me. 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p938097.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to force wildcard query not to ignore word endings

Posted by Ahmet Arslan <io...@yahoo.com>.

> I will try to remove SnowballPorterFilterFactory (is it
> right?) and then restart solr + reindex 

Exactly. This will solve your problem.

However remember that wildcard, prefix searches (*) are not analyzed. For example HAN* won't return anything.

Re: How to force wildcard query not to ignore word endings

Posted by "easy.angel" <oi...@gmail.com>.

I have standard configuration for the text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />

		<filter class="solr.StopFilterFactory"
				ignoreCase="true"
				words="stopwords.txt"
				enablePositionIncrements="true" />

		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
				catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />

		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
		<filter class="solr.StopFilterFactory"
				ignoreCase="true"
				words="stopwords.txt"
				enablePositionIncrements="true"
				/>
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
				catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="1" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
	</analyzer>
</fieldType>

I will try to remove SnowballPorterFilterFactory (is it right?) and then
restart solr + reindex 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to force wildcard query not to ignore word endings

Posted by Ahmet Arslan <io...@yahoo.com>.

> I'm not sure weather it can be solved in solr
> configuration itself
> (for example with query analyzer for the text field, or
> with index
> analyzer). 

Do you have StemFilterFactory in your field type? Remove it from query analyzer for the text field. Re-start core + re-index.

Re: How to force wildcard query not to ignore word endings

Posted by "easy.angel" <oi...@gmail.com>.

Thank you very match for you help and fast answer!

i always add wildcard because I use solr in autocomplete. So as you type
your query you can see temporary results. I also found that adding wild card
returns better temporary results. At least it was easiest solution in some
cases. I'm not sure weather it can be solved in solr configuration itself
(for example with query analyzer for the text field, or with index
analyzer). 

i think problem lies during indexing, and instead "hans" solr index "han"
value (but I'm not sure).


-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to force wildcard query not to ignore word endings

Posted by Ahmet Arslan <io...@yahoo.com>.

> I have one problem with querying solr. I indexed person
> with 2 fields:
>  
>  * firstname - Hans
>  * lastname - Mustermann
>  
> and I have copy field 'text' where these fields are copied.
> 'text' field is
> used during query.
>  
> Now, when I search:
>  
> han*
>  
> I do have Hans Mustermann in the query results. But if I
> will search:
>  
> hans*
>  
> I recieve no results! However query without wildcard will
> return correct
> results.
>  
> How can I configure solr to return Hans Mustermann for
> query: hans* ?
>  
> I add this wildcard dynamically (during query), so I want
> to have it in
> every query.

Easiest solution is to remove stemfilter from your analysis chain.
Or write "hans" no protword.txt file, so stemmer wont touch it.

Another solution requires writing custom code: integrate Lucene's AnalyzingQueryParser so that your wildcard queries are analyzed. 

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

By the way why are you inserting * at the end of your queries?