You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2013/05/01 08:05:59 UTC

Re: Issue with fuzzy search in Distributed Search

To ensure the all records exist in single node, i queried on specific
duration, so , for shards core and simple core query, results should be
similar. 

as you suggested, i analyzed the debugQuery for one specific search
*text:worde~1*, and I seen that the record which returns in shards core have
highlights like *word*, *words*, *word!n*. but when I look in debugQuery it
just processing for *word!n*, and was not processing  other highlights
(words, word), although it shows it in highlight for that record. and so,
shards core do not return other records , having text as *word* or *words* ,
but not *word!n* in it. 

on the other case, the simple core processing all *word*, *words*, *word!n*,
and return proper results.  this seems very weird behavior, any suggestion ? 



Jack Krupansky-2 wrote
> A fuzzy query itself does not know about distributed search - Lucene
> simply 
> scores the query results based on the local index. Then, Solr is merging
> the 
> merging the query results from different nodes.
> 
> Try the query locally for each node and set debugQuery=true and see how
> each 
> document gets scored.
> 
> I'm actually not sure what the specific "problem" (symptom) is that you
> are 
> seeing. I mean, maybe there is only 1 result on that node - how do you
> know 
> otherwise?? Or maybe one node has more exact matches.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: meghana
> Sent: Tuesday, April 30, 2013 7:51 AM
> To: 

> solr-user@.apache

> Subject: Issue with fuzzy search in Distributed Search
> 
> I have created 2 versions of Solr core in different servers. one is simple
> core having all records in one core. And other is shards core, distributed
> over 3 cores on server.
> 
> Simple core :
> 
> http://localhost:8080/sorl/core0/select?q=text:hoers~1
> 
> Distributed core :
> 
> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
> 
> data, schema and other configuration is similar in both the cores.
> 
> but while doing fuzzy search like hoers~1 one core returns many
> records(about 450), while other core return only 1 record.
> 
> While this issue does not seem related to Distributed Search, as Although
> i
> do not use distributed search, then also it do not return more rows.
> 
> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
> 
> below is schema definition for my field.
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       
> <analyzer type="index">
>       
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="false"
>                 />
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt"  />
>         
> <filter class="solr.LowerCaseFilterFactory"/>
>         
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         
> <filter class="solr.PorterStemFilterFactory"/>
>       
> </analyzer>
>       
> <analyzer type="query">
>         
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_extra_query.txt"
>                 enablePositionIncrements="false"
>                 />
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt"  />
>         
> <filter class="solr.LowerCaseFilterFactory"/>
>         
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         
> <filter class="solr.PorterStemFilterFactory"/>
>       
> </analyzer>
>     
> </fieldType>
> Not sure, what is wrong with this. Can anybody help me on this??
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4060201.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with fuzzy search in Distributed Search

Posted by meghana <me...@amultek.com>.

Please help me on this!! 


meghana wrote
> To ensure the all records exist in single node, i queried on specific
> duration, so , for shards core and simple core query, results should be
> similar. 
> 
> as you suggested, i analyzed the debugQuery for one specific search 
*
> text:worde~1
*
> , and I seen that the record which returns in shards core have highlights
> like 
*
> word
*
> , 
*
> words
*
> , 
*
> word!n
*
> . but when I look in debugQuery it just processing for 
*
> word!n
*
> , and was not processing  other highlights (words, word), although it
> shows it in highlight for that record. and so, shards core do not return
> other records , having text as 
*
> word
*
>  or 
*
> words
*
>  , but not 
*
> word!n
*
>  in it. 
> 
> on the other case, the simple core processing all 
*
> word
*
> , 
*
> words
*
> , 
*
> word!n
*
> , and return proper results.  this seems very weird behavior, any
> suggestion ? 
> 
> Jack Krupansky-2 wrote
>> A fuzzy query itself does not know about distributed search - Lucene
>> simply 
>> scores the query results based on the local index. Then, Solr is merging
>> the 
>> merging the query results from different nodes.
>> 
>> Try the query locally for each node and set debugQuery=true and see how
>> each 
>> document gets scored.
>> 
>> I'm actually not sure what the specific "problem" (symptom) is that you
>> are 
>> seeing. I mean, maybe there is only 1 result on that node - how do you
>> know 
>> otherwise?? Or maybe one node has more exact matches.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- 
>> From: meghana
>> Sent: Tuesday, April 30, 2013 7:51 AM
>> To: 

>> solr-user@.apache

>> Subject: Issue with fuzzy search in Distributed Search
>> 
>> I have created 2 versions of Solr core in different servers. one is
>> simple
>> core having all records in one core. And other is shards core,
>> distributed
>> over 3 cores on server.
>> 
>> Simple core :
>> 
>> http://localhost:8080/sorl/core0/select?q=text:hoers~1
>> 
>> Distributed core :
>> 
>> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
>> 
>> data, schema and other configuration is similar in both the cores.
>> 
>> but while doing fuzzy search like hoers~1 one core returns many
>> records(about 450), while other core return only 1 record.
>> 
>> While this issue does not seem related to Distributed Search, as Although
>> i
>> do not use distributed search, then also it do not return more rows.
>> 
>> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
>> 
>> below is schema definition for my field.
>> <fieldType name="text_en_splitting" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>       
>> <analyzer type="index">
>>       
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="false"
>>                 />
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_en.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt"  />
>>         
>> <filter class="solr.LowerCaseFilterFactory"/>
>>         
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>         
>> <filter class="solr.PorterStemFilterFactory"/>
>>       
>> </analyzer>
>>       
>> <analyzer type="query">
>>         
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         
>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_extra_query.txt"
>>                 enablePositionIncrements="false"
>>                 />
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_en.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt"  />
>>         
>> <filter class="solr.LowerCaseFilterFactory"/>
>>         
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>         
>> <filter class="solr.PorterStemFilterFactory"/>
>>       
>> </analyzer>
>>     
>> </fieldType>
>> Not sure, what is wrong with this. Can anybody help me on this??
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4061545.html
Sent from the Solr - User mailing list archive at Nabble.com.