You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2013/04/30 16:51:21 UTC

Issue with fuzzy search in Distributed Search

I have created 2 versions of Solr core in different servers. one is simple
core having all records in one core. And other is shards core, distributed
over 3 cores on server.

Simple core :

http://localhost:8080/sorl/core0/select?q=text:hoers~1

Distributed core :

http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1

data, schema and other configuration is similar in both the cores.

but while doing fuzzy search like hoers~1 one core returns many
records(about 450), while other core return only 1 record.

While this issue does not seem related to Distributed Search, as Although i
do not use distributed search, then also it do not return more rows.

as http://192.168.1.91:8080/core0/select?q=text:hoers~1

below is schema definition for my field.

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />        
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"  />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_extra_query.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"  />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Not sure, what is wrong with this. Can anybody help me on this??




--
View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with fuzzy search in Distributed Search

Posted by meghana <me...@amultek.com>.

Please help me on this!! 


meghana wrote
> To ensure the all records exist in single node, i queried on specific
> duration, so , for shards core and simple core query, results should be
> similar. 
> 
> as you suggested, i analyzed the debugQuery for one specific search 
*
> text:worde~1
*
> , and I seen that the record which returns in shards core have highlights
> like 
*
> word
*
> , 
*
> words
*
> , 
*
> word!n
*
> . but when I look in debugQuery it just processing for 
*
> word!n
*
> , and was not processing  other highlights (words, word), although it
> shows it in highlight for that record. and so, shards core do not return
> other records , having text as 
*
> word
*
>  or 
*
> words
*
>  , but not 
*
> word!n
*
>  in it. 
> 
> on the other case, the simple core processing all 
*
> word
*
> , 
*
> words
*
> , 
*
> word!n
*
> , and return proper results.  this seems very weird behavior, any
> suggestion ? 
> 
> Jack Krupansky-2 wrote
>> A fuzzy query itself does not know about distributed search - Lucene
>> simply 
>> scores the query results based on the local index. Then, Solr is merging
>> the 
>> merging the query results from different nodes.
>> 
>> Try the query locally for each node and set debugQuery=true and see how
>> each 
>> document gets scored.
>> 
>> I'm actually not sure what the specific "problem" (symptom) is that you
>> are 
>> seeing. I mean, maybe there is only 1 result on that node - how do you
>> know 
>> otherwise?? Or maybe one node has more exact matches.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- 
>> From: meghana
>> Sent: Tuesday, April 30, 2013 7:51 AM
>> To: 

>> solr-user@.apache

>> Subject: Issue with fuzzy search in Distributed Search
>> 
>> I have created 2 versions of Solr core in different servers. one is
>> simple
>> core having all records in one core. And other is shards core,
>> distributed
>> over 3 cores on server.
>> 
>> Simple core :
>> 
>> http://localhost:8080/sorl/core0/select?q=text:hoers~1
>> 
>> Distributed core :
>> 
>> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
>> 
>> data, schema and other configuration is similar in both the cores.
>> 
>> but while doing fuzzy search like hoers~1 one core returns many
>> records(about 450), while other core return only 1 record.
>> 
>> While this issue does not seem related to Distributed Search, as Although
>> i
>> do not use distributed search, then also it do not return more rows.
>> 
>> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
>> 
>> below is schema definition for my field.
>> <fieldType name="text_en_splitting" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>       
>> <analyzer type="index">
>>       
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="false"
>>                 />
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_en.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt"  />
>>         
>> <filter class="solr.LowerCaseFilterFactory"/>
>>         
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>         
>> <filter class="solr.PorterStemFilterFactory"/>
>>       
>> </analyzer>
>>       
>> <analyzer type="query">
>>         
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         
>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_extra_query.txt"
>>                 enablePositionIncrements="false"
>>                 />
>>         
>> <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords_en.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt"  />
>>         
>> <filter class="solr.LowerCaseFilterFactory"/>
>>         
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>         
>> <filter class="solr.PorterStemFilterFactory"/>
>>       
>> </analyzer>
>>     
>> </fieldType>
>> Not sure, what is wrong with this. Can anybody help me on this??
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4061545.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with fuzzy search in Distributed Search

Posted by meghana <me...@amultek.com>.

To ensure the all records exist in single node, i queried on specific
duration, so , for shards core and simple core query, results should be
similar. 

as you suggested, i analyzed the debugQuery for one specific search
*text:worde~1*, and I seen that the record which returns in shards core have
highlights like *word*, *words*, *word!n*. but when I look in debugQuery it
just processing for *word!n*, and was not processing  other highlights
(words, word), although it shows it in highlight for that record. and so,
shards core do not return other records , having text as *word* or *words* ,
but not *word!n* in it. 

on the other case, the simple core processing all *word*, *words*, *word!n*,
and return proper results.  this seems very weird behavior, any suggestion ? 



Jack Krupansky-2 wrote
> A fuzzy query itself does not know about distributed search - Lucene
> simply 
> scores the query results based on the local index. Then, Solr is merging
> the 
> merging the query results from different nodes.
> 
> Try the query locally for each node and set debugQuery=true and see how
> each 
> document gets scored.
> 
> I'm actually not sure what the specific "problem" (symptom) is that you
> are 
> seeing. I mean, maybe there is only 1 result on that node - how do you
> know 
> otherwise?? Or maybe one node has more exact matches.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: meghana
> Sent: Tuesday, April 30, 2013 7:51 AM
> To: 

> solr-user@.apache

> Subject: Issue with fuzzy search in Distributed Search
> 
> I have created 2 versions of Solr core in different servers. one is simple
> core having all records in one core. And other is shards core, distributed
> over 3 cores on server.
> 
> Simple core :
> 
> http://localhost:8080/sorl/core0/select?q=text:hoers~1
> 
> Distributed core :
> 
> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
> 
> data, schema and other configuration is similar in both the cores.
> 
> but while doing fuzzy search like hoers~1 one core returns many
> records(about 450), while other core return only 1 record.
> 
> While this issue does not seem related to Distributed Search, as Although
> i
> do not use distributed search, then also it do not return more rows.
> 
> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
> 
> below is schema definition for my field.
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       
> <analyzer type="index">
>       
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="false"
>                 />
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt"  />
>         
> <filter class="solr.LowerCaseFilterFactory"/>
>         
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         
> <filter class="solr.PorterStemFilterFactory"/>
>       
> </analyzer>
>       
> <analyzer type="query">
>         
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_extra_query.txt"
>                 enablePositionIncrements="false"
>                 />
>         
> <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt"  />
>         
> <filter class="solr.LowerCaseFilterFactory"/>
>         
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         
> <filter class="solr.PorterStemFilterFactory"/>
>       
> </analyzer>
>     
> </fieldType>
> Not sure, what is wrong with this. Can anybody help me on this??
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4060201.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with fuzzy search in Distributed Search

Posted by Jack Krupansky <ja...@basetechnology.com>.

A fuzzy query itself does not know about distributed search - Lucene simply 
scores the query results based on the local index. Then, Solr is merging the 
merging the query results from different nodes.

Try the query locally for each node and set debugQuery=true and see how each 
document gets scored.

I'm actually not sure what the specific "problem" (symptom) is that you are 
seeing. I mean, maybe there is only 1 result on that node - how do you know 
otherwise?? Or maybe one node has more exact matches.

-- Jack Krupansky

-----Original Message----- 
From: meghana
Sent: Tuesday, April 30, 2013 7:51 AM
To: solr-user@lucene.apache.org
Subject: Issue with fuzzy search in Distributed Search

I have created 2 versions of Solr core in different servers. one is simple
core having all records in one core. And other is shards core, distributed
over 3 cores on server.

Simple core :

http://localhost:8080/sorl/core0/select?q=text:hoers~1

Distributed core :

http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1

data, schema and other configuration is similar in both the cores.

but while doing fuzzy search like hoers~1 one core returns many
records(about 450), while other core return only 1 record.

While this issue does not seem related to Distributed Search, as Although i
do not use distributed search, then also it do not return more rows.

as http://192.168.1.91:8080/core0/select?q=text:hoers~1

below is schema definition for my field.

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"  />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_extra_query.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
protected="protwords.txt" types="wdfftypes.txt"  />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Not sure, what is wrong with this. Can anybody help me on this??




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
Sent from the Solr - User mailing list archive at Nabble.com.