You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Viresh Modi <vi...@highq.com> on 2013/09/26 13:18:59 UTC

Exact Word Match Search comes in first come In Solr4.3

I want to get ORDER As Per Exact Search match:

Search with "EMIR" comes First exact match  “Emir”  not “United Arab
Emirates”.

 For example, when you search for “EMIR” the first result has nothing to do
with that and is all about “United Arab Emirates”, which obviously contains
“Emir” as part of “Emirates”. This is obviously less relevant than an exact
match on “EMIR”.

*MY SOLR INDEX RESULT:*

<doc>

<str name="content">Weight  United Arab Emirates</str>

</doc>
<doc>

<str name="content">Emir My Search Content</str>

</doc>

*Debug for Query :*

 <str name="OnlineR3_6_4_10_22">
0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
  0.4016216 = fieldWeight in 0, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = termFreq=1.0
    3.2129729 = idf(docFreq=48, maxDocs=448)
    0.125 = fieldNorm(doc=0)</str>
    <str name="OnlineR3_6_4_10_23">
0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
  0.4016216 = fieldWeight in 0, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = termFreq=1.0
    3.2129729 = idf(docFreq=48, maxDocs=448)
    0.125 = fieldNorm(doc=0)</str>

*MY Schema.xml Looks like :*

<field name="content" type="text_en_splitting" indexed="true" stored="true"
termVectors="true" termPositions="true" termOffsets="true" />


<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Re: Exact Word Match Search comes in first come In Solr4.3

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hello there.

Use two fields, one unanalyzed and the other analyzed and boost the former.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 26, 2013 7:19 AM, "Viresh Modi" <vi...@highq.com> wrote:

> I want to get ORDER As Per Exact Search match:
>
> Search with "EMIR" comes First exact match  “Emir”  not “United Arab
> Emirates”.
>
>  For example, when you search for “EMIR” the first result has nothing to do
> with that and is all about “United Arab Emirates”, which obviously contains
> “Emir” as part of “Emirates”. This is obviously less relevant than an exact
> match on “EMIR”.
>
> *MY SOLR INDEX RESULT:*
>
> <doc>
>
> <str name="content">Weight  United Arab Emirates</str>
>
> </doc>
> <doc>
>
> <str name="content">Emir My Search Content</str>
>
> </doc>
>
> *Debug for Query :*
>
>  <str name="OnlineR3_6_4_10_22">
> 0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
>   0.4016216 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = termFreq=1.0
>     3.2129729 = idf(docFreq=48, maxDocs=448)
>     0.125 = fieldNorm(doc=0)</str>
>     <str name="OnlineR3_6_4_10_23">
> 0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
>   0.4016216 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = termFreq=1.0
>     3.2129729 = idf(docFreq=48, maxDocs=448)
>     0.125 = fieldNorm(doc=0)</str>
>
> *MY Schema.xml Looks like :*
>
> <field name="content" type="text_en_splitting" indexed="true" stored="true"
> termVectors="true" termPositions="true" termOffsets="true" />
>
>
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>

RE: Exact Word Match Search comes in first come In Solr4.3

Posted by Markus Jelsma <ma...@openindex.io>.
That won't boost order but Lucene's SpanFirstQuery does. You do have to make a custom query parser plugin for it but that's trivial.
 
-----Original message-----
> From:Otis Gospodnetic <ot...@gmail.com>
> Sent: Thursday 26th September 2013 13:24
> To: solr-user@lucene.apache.org
> Subject: Re: Exact Word Match Search comes in first come In Solr4.3
> 
> Hello there.
> 
> Use two fields, one unanalyzed and the other analyzed and boost the former.
> 
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Sep 26, 2013 7:19 AM, "Viresh Modi" <vi...@highq.com> wrote:
> 
> > I want to get ORDER As Per Exact Search match:
> >
> > Search with "EMIR" comes First exact match  “Emir”  not “United Arab
> > Emirates”.
> >
> >  For example, when you search for “EMIR” the first result has nothing to do
> > with that and is all about “United Arab Emirates”, which obviously contains
> > “Emir” as part of “Emirates”. This is obviously less relevant than an exact
> > match on “EMIR”.
> >
> > *MY SOLR INDEX RESULT:*
> >
> > <doc>
> >
> > <str name="content">Weight  United Arab Emirates</str>
> >
> > </doc>
> > <doc>
> >
> > <str name="content">Emir My Search Content</str>
> >
> > </doc>
> >
> > *Debug for Query :*
> >
> >  <str name="OnlineR3_6_4_10_22">
> > 0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
> >   0.4016216 = fieldWeight in 0, product of:
> >     1.0 = tf(freq=1.0), with freq of:
> >       1.0 = termFreq=1.0
> >     3.2129729 = idf(docFreq=48, maxDocs=448)
> >     0.125 = fieldNorm(doc=0)</str>
> >     <str name="OnlineR3_6_4_10_23">
> > 0.4016216 = (MATCH) weight(text:emir in 0) [DefaultSimilarity], result of:
> >   0.4016216 = fieldWeight in 0, product of:
> >     1.0 = tf(freq=1.0), with freq of:
> >       1.0 = termFreq=1.0
> >     3.2129729 = idf(docFreq=48, maxDocs=448)
> >     0.125 = fieldNorm(doc=0)</str>
> >
> > *MY Schema.xml Looks like :*
> >
> > <field name="content" type="text_en_splitting" indexed="true" stored="true"
> > termVectors="true" termPositions="true" termOffsets="true" />
> >
> >
> > <fieldType name="text_en_splitting" class="solr.TextField"
> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >       <analyzer type="index">
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >                <filter class="solr.StopFilterFactory"
> >                 ignoreCase="true"
> >                 words="lang/stopwords_en.txt"
> >                 enablePositionIncrements="true"
> >                 />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> >     <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.KeywordMarkerFilterFactory"
> > protected="protwords.txt"/>
> >         <filter class="solr.PorterStemFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.KeywordMarkerFilterFactory"
> > protected="protwords.txt"/>
> >         <filter class="solr.PorterStemFilterFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
>