You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2013/05/14 13:06:51 UTC
Solr4.2 - Fuzzy Search Problems
I am using Solr4.2 , I have few queries on new fuzzy implementation in
Solr4+
1) I come to know that Solr4+ accepts maximum editing distance to 2 (2
insertion, deletion, replacements). Is there any way , i can configure this
maximum editing distance limit ??
2) although I set editing distance to 1 in my query (e.g. worde~1), solr
returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
.. ect. )
3) Last and major issue, I had very few data at startup in my solr core (say
around 1K - 2K ), at that time, when i was searching with worde~1 , it was
returning many records (around 450).
Then I ingested few more records in my solr core (say around 1K). It was
ingested successfully , no errors or warning in Log. After that when I
performed the same fuzzy search (worde~1) on previous records only, not in
new ingested records , It did not return me previous results(around 450) as
well, and return total 1 record only having highlight as WORD!N .
It seems like , Issue is causing somewhere while ingesting last 1K records,
but can not able to catch that issue. also solr do not provide any error or
warning in log. Or I don't know the way of debugging this ingestion issue.
Below is configuration for my text field type text_en_splitting.
<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="false"
/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_extra_query.txt"
enablePositionIncrements="false"
/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Also I have one copy field on this text field , with field type
text_general_preserved. Below is configuration for it.
<fieldType name="text_general_preserved" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_ns.txt" enablePositionIncrements="false" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_extra_query.txt" enablePositionIncrements="false" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Hope I explained all my question to be understandable, Please Help me on
This.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.2 - Fuzzy Search Problems
Posted by meghana <me...@amultek.com>.
Thanks Chris ,
for my 2nd Query (~1 returns words with 2 editing distance), it may be the
issue.
still m looking for my last issue. hope jira helps to resolve that.
Chris Hostetter-3 wrote
> :
> : 2) although I set editing distance to 1 in my query (e.g. worde~1), solr
> : returns me results having 2 editing distance (like WORDOES, WORHEE,
> WORKEE,
> : .. ect. )
>
> fuzzy search works on *terms* in your index -- if you use a stemme when
> you index your data (your schema shows that you are) then a word in your
> input like "WORDOES" might wind up in your index as a term within the edit
> distance you specified (ie: "wordo" or "word" or something similar)
>
> : 3) Last and major issue, I had very few data at startup in my solr core
> (say
> : around 1K - 2K ), at that time, when i was searching with worde~1 , it
> was
> : returning many records (around 450).
> :
> : Then I ingested few more records in my solr core (say around 1K). It was
> : ingested successfully , no errors or warning in Log. After that when I
> : performed the same fuzzy search (worde~1) on previous records only, not
> in
> : new ingested records , It did not return me previous results(around 450)
> as
> : well, and return total 1 record only having highlight as WORD!N .
>
> This sounds like the same issue as discribed in SOLR-4824...
>
> https://issues.apache.org/jira/browse/SOLR-4824
>
>
> -Hoss
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199p4065576.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.2 - Fuzzy Search Problems
Posted by Chris Hostetter <ho...@fucit.org>.
:
: 2) although I set editing distance to 1 in my query (e.g. worde~1), solr
: returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
: .. ect. )
fuzzy search works on *terms* in your index -- if you use a stemme when
you index your data (your schema shows that you are) then a word in your
input like "WORDOES" might wind up in your index as a term within the edit
distance you specified (ie: "wordo" or "word" or something similar)
: 3) Last and major issue, I had very few data at startup in my solr core (say
: around 1K - 2K ), at that time, when i was searching with worde~1 , it was
: returning many records (around 450).
:
: Then I ingested few more records in my solr core (say around 1K). It was
: ingested successfully , no errors or warning in Log. After that when I
: performed the same fuzzy search (worde~1) on previous records only, not in
: new ingested records , It did not return me previous results(around 450) as
: well, and return total 1 record only having highlight as WORD!N .
This sounds like the same issue as discribed in SOLR-4824...
https://issues.apache.org/jira/browse/SOLR-4824
-Hoss