You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matteo Diarena <m....@volocom.it> on 2012/07/02 15:39:01 UTC
Fuzzy Search issues using Solr 4.0
Dear Solr Users,
I'm an enthusiastic solr user since version 1.4. I'm now working on a new
solr based application heavily using fuzzy searches for string matching.
Unfortunately I'm facing a strange problem using fuzzy search and I hope
someone can help me to get more information.
I indexed several company names in a field named ENTITY_NAME using the
following parameters in schema.xml
.
<fieldType name="whitespace_tokenized"
class="solr.TextField">
<analyzer>
<tokenizer
class="solr.WhitespaceTokenizerFactory" />
<filter
class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
.
<field name="ENTITY_NAME" type="whitespace_tokenized" indexed="true"
stored="true" />
.
One of these companies is "TS PUBLISHING INC"
Following the list of queries with the returned and the expected result
1) ENTITY_NAME:(ts AND publising) => matches, OK
2) ENTITY_NAME:(ts AND publising~1) => matches, OK
3) ENTITY_NAME:(td~1 AND publishing) => doesn't match, KO (it was
supposed to match)
4) ENTITY_NAME:(ts AND pablisin~3) => doesn't match, KO (it was
supposed to match)
Why td~1 does not match ts?
Why pablisin~3 publishing?
How can I investigate the problem?
Is there any parameter I can set in solrconfig.xml?
Is there any tool I can use to see how the automata is built?
Thanks a lot in advance,
Matteo Diarena
Senior KM Developer - VOLO.com S.r.l.
Via Luigi Rizzo, 8/1 - 20151 MILANO
Fax +39 02 8945 3500
Tel +39 02 8945 3023
Cell +39 345 2129244
<ma...@volocom.it> m.diarena@volocom.it
<http://www.volocom.it/> http://www.volocom.it