You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ilay raja <il...@gmail.com> on 2013/03/29 15:28:08 UTC

Solr fuzzy search with WordDemiliterFilter

Hi

  I need to apply fuzzy search for my production. It better the search
results for spelling issue. However, it is not applying the analyzer
filters configured in schema.xml
I know fuzzy and wildcard search wont apply the filters. But is there a way
to plugin the filters or write this logic at the client. Because am not
getting any results for queries with numbers and special symbols(-). The
configuration in schema.xml :

      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
    </fieldType>


How to make sure that the filters as per the indexing also applied on fuzzy
search at the query time when the filters configured are not working.

Please help.

Re: Solr fuzzy search with WordDemiliterFilter

Posted by Jack Krupansky <ja...@basetechnology.com>.
The use of the fuzzy query operator will suppress the Word Delimiter Filter 
at query time. That's just the way it works. You can't use both fuzzy query 
and WDF when WDF is splitting apart words, numbers, and case changes, and 
throwing away special characters as well.

To put it simply, at query time the user needs to close their eyes and 
imagine what transformations WDF is doing and then query based on that.

One workaround: copy to a separate field that does not use WDF. Then the 
user can use fuzzy query fine (other than that it is limited to an editing 
distance of 2) for that other field.

-- Jack Krupansky

-----Original Message----- 
From: ilay raja
Sent: Friday, March 29, 2013 10:28 AM
To: solr-user@lucene.apache.org ; solr-dev@lucene.apache.org
Subject: Solr fuzzy search with WordDemiliterFilter

Hi

  I need to apply fuzzy search for my production. It better the search
results for spelling issue. However, it is not applying the analyzer
filters configured in schema.xml
I know fuzzy and wildcard search wont apply the filters. But is there a way
to plugin the filters or write this logic at the client. Because am not
getting any results for queries with numbers and special symbols(-). The
configuration in schema.xml :

      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
    </fieldType>


How to make sure that the filters as per the indexing also applied on fuzzy
search at the query time when the filters configured are not working.

Please help.