You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kissue Kissue <ki...@gmail.com> on 2012/04/12 13:18:46 UTC

Solr Scoring

Hi,

I have a field in my index called itemDesc which i am applying
EnglishMinimalStemFilterFactory to. So if i index a value to this field
containing "Edges", the EnglishMinimalStemFilterFactory applies stemming
and "Edges" becomes "Edge". Now when i search for "Edges", documents with
"Edge" score better than documents with the actual search word - "Edges".
Is there a way i can make documents with the actual search word in this
case "Edges" score better than document with "Edge"?

I am using Solr 3.5. My field definition is shown below:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
               <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
             <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
             <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
      </analyzer>
    </fieldType>

Thanks.

Re: Solr Scoring

Posted by Li Li <fa...@gmail.com>.

another way is to use payload http://wiki.apache.org/solr/Payloads
the advantage of payload is that you only need one field and can make frq
file smaller than use two fields. but the disadvantage is payload is stored
in prx file, so I am not sure which one is fast. maybe you can try them
both.

On Fri, Apr 13, 2012 at 8:04 AM, Erick Erickson <er...@gmail.com>wrote:

> GAH! I had my head in "make this happen in one field" when I wrote my
> response, without being explicit. Of course Walter's solution is pretty
> much the standard way to deal with this.
>
> Best
> Erick
>
> On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> > It is easy. Create two fields, text_exact and text_stem. Don't use the
> stemmer in the first chain, do use the stemmer in the second. Give the
> text_exact a bigger weight than text_stem.
> >
> > wunder
> >
> > On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
> >
> >> No, I don't think there's an OOB way to make this happen. It's
> >> a recurring theme, "make exact matches score higher than
> >> stemmed matches".
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> I have a field in my index called itemDesc which i am applying
> >>> EnglishMinimalStemFilterFactory to. So if i index a value to this field
> >>> containing "Edges", the EnglishMinimalStemFilterFactory applies
> stemming
> >>> and "Edges" becomes "Edge". Now when i search for "Edges", documents
> with
> >>> "Edge" score better than documents with the actual search word -
> "Edges".
> >>> Is there a way i can make documents with the actual search word in this
> >>> case "Edges" score better than document with "Edge"?
> >>>
> >>> I am using Solr 3.5. My field definition is shown below:
> >>>
> >>> <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
> >>>      <analyzer type="index">
> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>               <filter class="solr.SynonymFilterFactory"
> >>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> >>>             <filter class="solr.StopFilterFactory"
> >>>                ignoreCase="true"
> >>>                words="stopwords_en.txt"
> >>>                enablePositionIncrements="true"
> >>>             <filter class="solr.LowerCaseFilterFactory"/>
> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >>>      </analyzer>
> >>>      <analyzer type="query">
> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>        <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> >>> ignoreCase="true" expand="true"/>
> >>>        <filter class="solr.StopFilterFactory"
> >>>                ignoreCase="true"
> >>>                words="stopwords_en.txt"
> >>>                enablePositionIncrements="true"
> >>>                />
> >>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
> >>>        <filter class="solr.KeywordMarkerFilterFactory"
> >>> protected="protwords.txt"/>
> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >>>      </analyzer>
> >>>    </fieldType>
> >>>
> >>> Thanks.
> >
> >
> >
> >
> >
>

Re: Solr Scoring

Posted by Lance Norskog <go...@gmail.com>.

This was a common one when I was matching movie and song names. If
that is your project, also try boosting if it's the first word or on
shorter titles. Also try bigrams of stopwords: "Call of the Wild"
becomes "call", "of-the", "wild".

The bigrams trick is also good if you have people block-copying large
chunks of boilerplate for finding official documents.

On Fri, Apr 13, 2012 at 2:04 AM, Kissue Kissue <ki...@gmail.com> wrote:
> Thanks a lot. I had already implemented Walter's solution and was wondering
> if this was the right way to deal with it. This has now given me the
> confidence to go with the solution.
>
> Many thanks.
>
> On Fri, Apr 13, 2012 at 1:04 AM, Erick Erickson <er...@gmail.com>wrote:
>
>> GAH! I had my head in "make this happen in one field" when I wrote my
>> response, without being explicit. Of course Walter's solution is pretty
>> much the standard way to deal with this.
>>
>> Best
>> Erick
>>
>> On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>> > It is easy. Create two fields, text_exact and text_stem. Don't use the
>> stemmer in the first chain, do use the stemmer in the second. Give the
>> text_exact a bigger weight than text_stem.
>> >
>> > wunder
>> >
>> > On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
>> >
>> >> No, I don't think there's an OOB way to make this happen. It's
>> >> a recurring theme, "make exact matches score higher than
>> >> stemmed matches".
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com>
>> wrote:
>> >>> Hi,
>> >>>
>> >>> I have a field in my index called itemDesc which i am applying
>> >>> EnglishMinimalStemFilterFactory to. So if i index a value to this field
>> >>> containing "Edges", the EnglishMinimalStemFilterFactory applies
>> stemming
>> >>> and "Edges" becomes "Edge". Now when i search for "Edges", documents
>> with
>> >>> "Edge" score better than documents with the actual search word -
>> "Edges".
>> >>> Is there a way i can make documents with the actual search word in this
>> >>> case "Edges" score better than document with "Edge"?
>> >>>
>> >>> I am using Solr 3.5. My field definition is shown below:
>> >>>
>> >>> <fieldType name="text_en" class="solr.TextField"
>> positionIncrementGap="100">
>> >>>      <analyzer type="index">
>> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>> >>>               <filter class="solr.SynonymFilterFactory"
>> >>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>> >>>             <filter class="solr.StopFilterFactory"
>> >>>                ignoreCase="true"
>> >>>                words="stopwords_en.txt"
>> >>>                enablePositionIncrements="true"
>> >>>             <filter class="solr.LowerCaseFilterFactory"/>
>> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>> >>>      </analyzer>
>> >>>      <analyzer type="query">
>> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>> >>>        <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt"
>> >>> ignoreCase="true" expand="true"/>
>> >>>        <filter class="solr.StopFilterFactory"
>> >>>                ignoreCase="true"
>> >>>                words="stopwords_en.txt"
>> >>>                enablePositionIncrements="true"
>> >>>                />
>> >>>        <filter class="solr.LowerCaseFilterFactory"/>
>> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>> >>>        <filter class="solr.KeywordMarkerFilterFactory"
>> >>> protected="protwords.txt"/>
>> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>> >>>      </analyzer>
>> >>>    </fieldType>
>> >>>
>> >>> Thanks.
>> >
>> >
>> >
>> >
>> >
>>



-- 
Lance Norskog
goksron@gmail.com

Re: Solr Scoring

Posted by Kissue Kissue <ki...@gmail.com>.

Thanks a lot. I had already implemented Walter's solution and was wondering
if this was the right way to deal with it. This has now given me the
confidence to go with the solution.

Many thanks.

On Fri, Apr 13, 2012 at 1:04 AM, Erick Erickson <er...@gmail.com>wrote:

> GAH! I had my head in "make this happen in one field" when I wrote my
> response, without being explicit. Of course Walter's solution is pretty
> much the standard way to deal with this.
>
> Best
> Erick
>
> On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> > It is easy. Create two fields, text_exact and text_stem. Don't use the
> stemmer in the first chain, do use the stemmer in the second. Give the
> text_exact a bigger weight than text_stem.
> >
> > wunder
> >
> > On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
> >
> >> No, I don't think there's an OOB way to make this happen. It's
> >> a recurring theme, "make exact matches score higher than
> >> stemmed matches".
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> I have a field in my index called itemDesc which i am applying
> >>> EnglishMinimalStemFilterFactory to. So if i index a value to this field
> >>> containing "Edges", the EnglishMinimalStemFilterFactory applies
> stemming
> >>> and "Edges" becomes "Edge". Now when i search for "Edges", documents
> with
> >>> "Edge" score better than documents with the actual search word -
> "Edges".
> >>> Is there a way i can make documents with the actual search word in this
> >>> case "Edges" score better than document with "Edge"?
> >>>
> >>> I am using Solr 3.5. My field definition is shown below:
> >>>
> >>> <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
> >>>      <analyzer type="index">
> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>               <filter class="solr.SynonymFilterFactory"
> >>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> >>>             <filter class="solr.StopFilterFactory"
> >>>                ignoreCase="true"
> >>>                words="stopwords_en.txt"
> >>>                enablePositionIncrements="true"
> >>>             <filter class="solr.LowerCaseFilterFactory"/>
> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >>>      </analyzer>
> >>>      <analyzer type="query">
> >>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>        <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> >>> ignoreCase="true" expand="true"/>
> >>>        <filter class="solr.StopFilterFactory"
> >>>                ignoreCase="true"
> >>>                words="stopwords_en.txt"
> >>>                enablePositionIncrements="true"
> >>>                />
> >>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
> >>>        <filter class="solr.KeywordMarkerFilterFactory"
> >>> protected="protwords.txt"/>
> >>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >>>      </analyzer>
> >>>    </fieldType>
> >>>
> >>> Thanks.
> >
> >
> >
> >
> >
>

Re: Solr Scoring

Posted by Erick Erickson <er...@gmail.com>.

GAH! I had my head in "make this happen in one field" when I wrote my
response, without being explicit. Of course Walter's solution is pretty
much the standard way to deal with this.

Best
Erick

On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem.
>
> wunder
>
> On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
>
>> No, I don't think there's an OOB way to make this happen. It's
>> a recurring theme, "make exact matches score higher than
>> stemmed matches".
>>
>> Best
>> Erick
>>
>> On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com> wrote:
>>> Hi,
>>>
>>> I have a field in my index called itemDesc which i am applying
>>> EnglishMinimalStemFilterFactory to. So if i index a value to this field
>>> containing "Edges", the EnglishMinimalStemFilterFactory applies stemming
>>> and "Edges" becomes "Edge". Now when i search for "Edges", documents with
>>> "Edge" score better than documents with the actual search word - "Edges".
>>> Is there a way i can make documents with the actual search word in this
>>> case "Edges" score better than document with "Edge"?
>>>
>>> I am using Solr 3.5. My field definition is shown below:
>>>
>>> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
>>>      <analyzer type="index">
>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>               <filter class="solr.SynonymFilterFactory"
>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>             <filter class="solr.StopFilterFactory"
>>>                ignoreCase="true"
>>>                words="stopwords_en.txt"
>>>                enablePositionIncrements="true"
>>>             <filter class="solr.LowerCaseFilterFactory"/>
>>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>>>      </analyzer>
>>>      <analyzer type="query">
>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"/>
>>>        <filter class="solr.StopFilterFactory"
>>>                ignoreCase="true"
>>>                words="stopwords_en.txt"
>>>                enablePositionIncrements="true"
>>>                />
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>>>        <filter class="solr.KeywordMarkerFilterFactory"
>>> protected="protwords.txt"/>
>>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>>>      </analyzer>
>>>    </fieldType>
>>>
>>> Thanks.
>
>
>
>
>

Re: Solr Scoring

Posted by Walter Underwood <wu...@wunderwood.org>.

It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem.

wunder

On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:

> No, I don't think there's an OOB way to make this happen. It's
> a recurring theme, "make exact matches score higher than
> stemmed matches".
> 
> Best
> Erick
> 
> On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com> wrote:
>> Hi,
>> 
>> I have a field in my index called itemDesc which i am applying
>> EnglishMinimalStemFilterFactory to. So if i index a value to this field
>> containing "Edges", the EnglishMinimalStemFilterFactory applies stemming
>> and "Edges" becomes "Edge". Now when i search for "Edges", documents with
>> "Edge" score better than documents with the actual search word - "Edges".
>> Is there a way i can make documents with the actual search word in this
>> case "Edges" score better than document with "Edge"?
>> 
>> I am using Solr 3.5. My field definition is shown below:
>> 
>> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
>>      <analyzer type="index">
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>               <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>             <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>             <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>>      </analyzer>
>>      <analyzer type="query">
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>                />
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>>        <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>> 
>> Thanks.

Re: Solr Scoring

Posted by Erick Erickson <er...@gmail.com>.

No, I don't think there's an OOB way to make this happen. It's
a recurring theme, "make exact matches score higher than
stemmed matches".

Best
Erick

On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue <ki...@gmail.com> wrote:
> Hi,
>
> I have a field in my index called itemDesc which i am applying
> EnglishMinimalStemFilterFactory to. So if i index a value to this field
> containing "Edges", the EnglishMinimalStemFilterFactory applies stemming
> and "Edges" becomes "Edge". Now when i search for "Edges", documents with
> "Edge" score better than documents with the actual search word - "Edges".
> Is there a way i can make documents with the actual search word in this
> case "Edges" score better than document with "Edge"?
>
> I am using Solr 3.5. My field definition is shown below:
>
> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>               <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>             <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>             <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.EnglishMinimalStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> Thanks.