You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matteo Diarena <m....@volocom.it> on 2012/07/02 15:39:01 UTC

Fuzzy Search issues using Solr 4.0

Dear Solr Users,

I'm an enthusiastic solr user since version 1.4. I'm now working on a new
solr based application heavily using fuzzy searches for string matching.

Unfortunately I'm facing a strange problem using fuzzy search and I hope
someone can help me to get more information.

 

I indexed several company names in a field named ENTITY_NAME using the
following parameters in schema.xml

 

.

                <fieldType name="whitespace_tokenized"
class="solr.TextField">

                               <analyzer>

                               <tokenizer
class="solr.WhitespaceTokenizerFactory" />

                                                               <filter
class="solr.LowerCaseFilterFactory"/>

                               </analyzer>

</fieldType>

.

<field name="ENTITY_NAME" type="whitespace_tokenized" indexed="true"
stored="true" />

.

 

One of these companies is "TS PUBLISHING INC"

Following the list of queries with the returned and the expected result

1)      ENTITY_NAME:(ts AND publising)           => matches, OK

2)      ENTITY_NAME:(ts AND publising~1)      => matches, OK

3)      ENTITY_NAME:(td~1 AND publishing)  => doesn't match, KO (it was
supposed to match)

4)      ENTITY_NAME:(ts AND pablisin~3)        => doesn't match, KO (it was
supposed to match)

 

Why td~1 does not match ts?

Why pablisin~3 publishing?

 

How can I investigate the problem? 

Is there any parameter I can set in solrconfig.xml? 

Is there any tool I can use to see how the automata is built?

 

Thanks a lot in advance,

Matteo Diarena
Senior KM Developer - VOLO.com S.r.l.
Via Luigi Rizzo, 8/1 - 20151 MILANO
Fax  +39 02 8945 3500

Tel  +39 02 8945 3023
Cell +39 345 2129244
 <ma...@volocom.it> m.diarena@volocom.it
 <http://www.volocom.it/> http://www.volocom.it