You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cjkadakia <cj...@sonicbids.com> on 2010/03/08 17:05:27 UTC
Wildcard question -- case issue
I'm encountering a potential bug in Solr regarding wildcards. I have two
fields defined thusly:
<!-- A general unstemmed text field - good if one does not know the
language of the field -->
<fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
and
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
</fieldType>
When searching with wildcards I get the following behavior.
Two Documents in the index are named "CMJ foo bar" and "CME foo bar"
The name field has been indexed twice as "name" and "namesimple"
query:
spell?q=name:(cm*) OR namesimple:(cm*)
returns:
CMJ foo bar
CME foo bar
spell?q=name:(CM*) OR namesimple:(CM*)
returns
No results.
I added a equivalent synonym for "cmj,CMJ" and re-indexed
spell?q=name:(CM*) OR namesimple:(CM*)
returns
CMJ foo bar
Naturally I can't see the value or practical use of adding each of these as
they get reported by users and the documentation I've read (as well as
feedback I received on these forums) I've found stemming can interfere with
wildcards during query and indexing, which is why the namesimple field is of
type "textgen." This solved other wildcard/case issues, but this one
remains.
Any suggestions would be appreciated. Thanks!
--
View this message in context: http://old.nabble.com/Wildcard-question----case-issue-tp27823332p27823332.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard question -- case issue
Posted by cjkadakia <cj...@sonicbids.com>.
Understood. My solution was to convert any search terms with an asterisk to
lowercase prior to submitting to solr and it seems to be working correctly
now. Thanks for your help.
--
View this message in context: http://old.nabble.com/Wildcard-question----case-issue-tp27823332p27836740.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard question -- case issue
Posted by Ahmet Arslan <io...@yahoo.com>.
> query:
>
> spell?q=name:(cm*) OR namesimple:(cm*)
>
> returns:
> CMJ foo bar
> CME foo bar
>
> spell?q=name:(CM*) OR namesimple:(CM*)
> returns
> No results.
"Wildcard queries are not analyzed by Lucene and hence the behavior. [1]
[1]http://www.search-lucene.com/m?id=4A8CE9B2.2070009@ait.co.at||wildcard%20not%20analyzed