You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2013/05/01 08:05:59 UTC
Re: Issue with fuzzy search in Distributed Search
To ensure the all records exist in single node, i queried on specific
duration, so , for shards core and simple core query, results should be
similar.
as you suggested, i analyzed the debugQuery for one specific search
*text:worde~1*, and I seen that the record which returns in shards core have
highlights like *word*, *words*, *word!n*. but when I look in debugQuery it
just processing for *word!n*, and was not processing other highlights
(words, word), although it shows it in highlight for that record. and so,
shards core do not return other records , having text as *word* or *words* ,
but not *word!n* in it.
on the other case, the simple core processing all *word*, *words*, *word!n*,
and return proper results. this seems very weird behavior, any suggestion ?
Jack Krupansky-2 wrote
> A fuzzy query itself does not know about distributed search - Lucene
> simply
> scores the query results based on the local index. Then, Solr is merging
> the
> merging the query results from different nodes.
>
> Try the query locally for each node and set debugQuery=true and see how
> each
> document gets scored.
>
> I'm actually not sure what the specific "problem" (symptom) is that you
> are
> seeing. I mean, maybe there is only 1 result on that node - how do you
> know
> otherwise?? Or maybe one node has more exact matches.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: meghana
> Sent: Tuesday, April 30, 2013 7:51 AM
> To:
> solr-user@.apache
> Subject: Issue with fuzzy search in Distributed Search
>
> I have created 2 versions of Solr core in different servers. one is simple
> core having all records in one core. And other is shards core, distributed
> over 3 cores on server.
>
> Simple core :
>
> http://localhost:8080/sorl/core0/select?q=text:hoers~1
>
> Distributed core :
>
> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
>
> data, schema and other configuration is similar in both the cores.
>
> but while doing fuzzy search like hoers~1 one core returns many
> records(about 450), while other core return only 1 record.
>
> While this issue does not seem related to Distributed Search, as Although
> i
> do not use distributed search, then also it do not return more rows.
>
> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
>
> below is schema definition for my field.
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
> <analyzer type="index">
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="false"
> />
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt" />
>
> <filter class="solr.LowerCaseFilterFactory"/>
>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
> <filter class="solr.PorterStemFilterFactory"/>
>
> </analyzer>
>
> <analyzer type="query">
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords_extra_query.txt"
> enablePositionIncrements="false"
> />
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> protected="protwords.txt" types="wdfftypes.txt" />
>
> <filter class="solr.LowerCaseFilterFactory"/>
>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
> <filter class="solr.PorterStemFilterFactory"/>
>
> </analyzer>
>
> </fieldType>
> Not sure, what is wrong with this. Can anybody help me on this??
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
> Sent from the Solr - User mailing list archive at Nabble.com.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4060201.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with fuzzy search in Distributed Search
Posted by meghana <me...@amultek.com>.
Please help me on this!!
meghana wrote
> To ensure the all records exist in single node, i queried on specific
> duration, so , for shards core and simple core query, results should be
> similar.
>
> as you suggested, i analyzed the debugQuery for one specific search
*
> text:worde~1
*
> , and I seen that the record which returns in shards core have highlights
> like
*
> word
*
> ,
*
> words
*
> ,
*
> word!n
*
> . but when I look in debugQuery it just processing for
*
> word!n
*
> , and was not processing other highlights (words, word), although it
> shows it in highlight for that record. and so, shards core do not return
> other records , having text as
*
> word
*
> or
*
> words
*
> , but not
*
> word!n
*
> in it.
>
> on the other case, the simple core processing all
*
> word
*
> ,
*
> words
*
> ,
*
> word!n
*
> , and return proper results. this seems very weird behavior, any
> suggestion ?
>
> Jack Krupansky-2 wrote
>> A fuzzy query itself does not know about distributed search - Lucene
>> simply
>> scores the query results based on the local index. Then, Solr is merging
>> the
>> merging the query results from different nodes.
>>
>> Try the query locally for each node and set debugQuery=true and see how
>> each
>> document gets scored.
>>
>> I'm actually not sure what the specific "problem" (symptom) is that you
>> are
>> seeing. I mean, maybe there is only 1 result on that node - how do you
>> know
>> otherwise?? Or maybe one node has more exact matches.
>>
>> -- Jack Krupansky
>>
>> -----Original Message-----
>> From: meghana
>> Sent: Tuesday, April 30, 2013 7:51 AM
>> To:
>> solr-user@.apache
>> Subject: Issue with fuzzy search in Distributed Search
>>
>> I have created 2 versions of Solr core in different servers. one is
>> simple
>> core having all records in one core. And other is shards core,
>> distributed
>> over 3 cores on server.
>>
>> Simple core :
>>
>> http://localhost:8080/sorl/core0/select?q=text:hoers~1
>>
>> Distributed core :
>>
>> http://192.168.1.91:8080/core0/select?shards=http://192.168.1.91:8080/core0,http://192.168.1.91:8080/core1,http://192.168.1.91:8080/core2&q=text:hoers~1
>>
>> data, schema and other configuration is similar in both the cores.
>>
>> but while doing fuzzy search like hoers~1 one core returns many
>> records(about 450), while other core return only 1 record.
>>
>> While this issue does not seem related to Distributed Search, as Although
>> i
>> do not use distributed search, then also it do not return more rows.
>>
>> as http://192.168.1.91:8080/core0/select?q=text:hoers~1
>>
>> below is schema definition for my field.
>> <fieldType name="text_en_splitting" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>
>> <analyzer type="index">
>>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>
>> <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="false"
>> />
>>
>> <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> words="stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>>
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt" />
>>
>> <filter class="solr.LowerCaseFilterFactory"/>
>>
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>
>> <filter class="solr.PorterStemFilterFactory"/>
>>
>> </analyzer>
>>
>> <analyzer type="query">
>>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>
>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>
>> <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> words="stopwords_extra_query.txt"
>> enablePositionIncrements="false"
>> />
>>
>> <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> words="stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>>
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
>> protected="protwords.txt" types="wdfftypes.txt" />
>>
>> <filter class="solr.LowerCaseFilterFactory"/>
>>
>> <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>
>> <filter class="solr.PorterStemFilterFactory"/>
>>
>> </analyzer>
>>
>> </fieldType>
>> Not sure, what is wrong with this. Can anybody help me on this??
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Issue-with-fuzzy-search-in-Distributed-Search-tp4060022.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Results-differ-in-2-solr-cores-same-configuration-for-fuzzy-search-tp4060022p4061545.html
Sent from the Solr - User mailing list archive at Nabble.com.