You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by PAVAN <pa...@gmail.com> on 2013/09/18 08:50:07 UTC

SORTING RESULTS BASED ON RELAVANCY

Hi,

       i am using fuzzy logic and it is giving exact results but i need to
sort the results based on relavancy. Means closer match results comes first.

anyone can help with this..


Regards,
Pavan.



--
View this message in context: http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-ON-RELAVANCY-tp4090789.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SORTING RESULTS BASED ON RELAVANCY

Posted by Gora Mohanty <go...@mimirtech.com>.

On 18 September 2013 12:39, Alexandre Rafalovitch <ar...@gmail.com> wrote:
> The default sort is by relevancy. So, if you are getting it in the wrong
> order, it think it is relevant in different ways. Depending on algorithm
> you use, there are different boosting functions.
[...]

Also, you can get an explanation of the scoring by adding
&debugQuery=on to the Solr search URL. Please see
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery

Regards,
Gora

Re: SORTING RESULTS BASED ON RELAVANCY

Posted by Chris Hostetter <ho...@fucit.org>.

: Unless i'm missing something: FuzzyQuery defaults to using the 
: "TopTermsScoringBooleanQueryRewrite" method based on the terms found in 
: the index that match the fuzzy expression.  So the results of a simple 
: fuzzy query should already come back based on the tf/idf scores of the 
: terms.

to give a concrete example...

using 4.4, with the example configs & sample data, this query...

http://localhost:8983/solr/select?defType=edismax&qf=features&q=blak~2&fl=score,id,features&debugQuery=true

...matches two documents with differnet scores.  the resulting scores are 
based on both the edit distance of the word that matches the fuzzy term 
(which durring query-rewriting is used as a term boost), and the tf/idf of those terms...

A doc that contains "black" (edit distance 1 => boost * 0.75)...

0.39237294 = (MATCH) sum of:
  0.39237294 = (MATCH) weight(features:black^0.75 in 26) [DefaultSimilarity], result of:
    0.39237294 = score(doc=26,freq=1.0 = termFreq=1.0), product of:
      0.83205026 = queryWeight, product of:
        0.75 = boost
        3.7725887 = idf(docFreq=1, maxDocs=32)
        0.29406872 = queryNorm
      0.4715736 = fieldWeight in 26, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        3.7725887 = idf(docFreq=1, maxDocs=32)
        0.125 = fieldNorm(doc=26)

...compared to a doc that contains "book" (edit distance 2 => boost * 0.5)...

0.22888422 = (MATCH) sum of:
  0.22888422 = (MATCH) weight(features:book^0.5 in 5) [DefaultSimilarity], result of:
    0.22888422 = score(doc=5,freq=1.0 = termFreq=1.0), product of:
      0.5547002 = queryWeight, product of:
        0.5 = boost
        3.7725887 = idf(docFreq=1, maxDocs=32)
        0.29406872 = queryNorm
      0.4126269 = fieldWeight in 5, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        3.7725887 = idf(docFreq=1, maxDocs=32)
        0.109375 = fieldNorm(doc=5)



-Hoss

Re: SORTING RESULTS BASED ON RELAVANCY

Posted by Chris Hostetter <ho...@fucit.org>.

:      Thanks for you reply....can you please check the following details and
: give me suggestions how can i do it, then it will be more helpful to me

you need to show us some examples of your documents and the 
debugQuery=true output for those documents for us to better understand the 
behavior you are seeing.

Unless i'm missing something: FuzzyQuery defaults to using the 
"TopTermsScoringBooleanQueryRewrite" method based on the terms found in 
the index that match the fuzzy expression.  So the results of a simple 
fuzzy query should already come back based on the tf/idf scores of the 
terms.

: if a user search for "iphne 4" first it has to check exact match it is not
: found then i am splitting string into two strings s1 and s2. i am adding
: ~0.5 for both s1 and s2.

: i need "iphone 4" result first

Ok, but you haven't given us any indication of what you are *actually* 
getting as your first result, so we're really can't even begin to guess 
why you aren't getting the results you expect.

if you are seeing identical scores for all documents, then it's possibly 
because of some of hte other ways you have combined up custom params to 
build a complex query.


In particular: no where in your example URL, or the configured defaults 
you pasted below, do you show us how you are ultimaitely building up the 
"q" param from the various custom params you have defined...


: and i did the configuration in the following way...
: 
: 
: <str name="1.q.op">AND</str>
:    <str name="1.df">fsw_title</str>
:    <str name="1.rows">15</str>
:    <str name="1.q">{!edismax v=$s}</str>
:    
:    <str name="1.fq">city:All OR _query_:"{!field f=city v=$c}"</str>
:    <bool name="1.group">true</bool>
:    <str name="1.group.field">mpId</str>
:    <str name="1.group.limit">3</str>
:    <bool name="1.group.main">true</bool>
:    
:    <str name="1.defType">edismax</str>
:    <bool name="1.lowercaseOperators">false</bool>
:    <str name="1.qf">tsw_title^15.0 tf_title^10.0 tsw_keywords^1
: keywords^0.5</str>
:    <str name="1.pf2">fsw_title~1^50.0</str>
:    <str name="1.pf3">fsw_title~1^25.0</str>
:    <str name="1.bf">sum(product(typeId,100),weightage)</str>
:    
: <str name="2.q.op">OR</str>
:    <str name="2.df">fsw_title</str>
:    <str name="2.rows">20</str>
:    <str name="2.q">_query_:"{!edismax qf=$qfs1 pf=$pfx pf2=$pf2x v=$s1}" AND
: _query_:"{!edismax qf=$qfs2 v=$s2}"</str>
:    
:    <str name="2.fq">city:All OR _query_:"{!field f=city v=$c}"</str>
:    <bool name="2.group">true</bool>
:    <str name="2.group.field">mpId</str>
:    <str name="2.group.limit">5</str>
:    <bool name="2.group.main">true</bool>
:    
:    <bool name="2.lowercaseOperators">false</bool>
:    <str name="2.qfs1">fsw_title^30 tsw_title^20 tf_title^15.0
: keywords^1.0</str>
:    <str name="2.qfs2">tsw_title^15.0 tf_title^10.0 tsw_keywords^1
: keywords^0.5</str>
:    <str name="2.pfx">fsw_title~1^100.0</str>
:    <str name="2.pf2x">fsw_title~1^50.0</str>
:    <str name="2.pf3x">fsw_title~1^25.0</str>
:    <str name="2.bf">product(typeId,100)</str>


-Hoss

Re: SORTING RESULTS BASED ON RELAVANCY

Posted by PAVAN <pa...@gmail.com>.

Hi alex,

     Thanks for you reply....can you please check the following details and
give me suggestions how can i do it, then it will be more helpful to me


i am passing query parameters like 

http://localhost:8080/solr/core/c=cityname&s=iphne+4&s1=iphne~0.5&s2=4~0.5

here "s" is the main string and splitted into s1 and s2 for fuzzy


if a user search for "iphne 4" first it has to check exact match it is not
found then i am splitting string into two strings s1 and s2. i am adding
~0.5 for both s1 and s2.


i need "iphone 4" result first


and i did the configuration in the following way...


<str name="1.q.op">AND</str>
   <str name="1.df">fsw_title</str>
   <str name="1.rows">15</str>
   <str name="1.q">{!edismax v=$s}</str>
   
   <str name="1.fq">city:All OR _query_:"{!field f=city v=$c}"</str>
   <bool name="1.group">true</bool>
   <str name="1.group.field">mpId</str>
   <str name="1.group.limit">3</str>
   <bool name="1.group.main">true</bool>
   
   <str name="1.defType">edismax</str>
   <bool name="1.lowercaseOperators">false</bool>
   <str name="1.qf">tsw_title^15.0 tf_title^10.0 tsw_keywords^1
keywords^0.5</str>
   <str name="1.pf2">fsw_title~1^50.0</str>
   <str name="1.pf3">fsw_title~1^25.0</str>
   <str name="1.bf">sum(product(typeId,100),weightage)</str>
   
<str name="2.q.op">OR</str>
   <str name="2.df">fsw_title</str>
   <str name="2.rows">20</str>
   <str name="2.q">_query_:"{!edismax qf=$qfs1 pf=$pfx pf2=$pf2x v=$s1}" AND
_query_:"{!edismax qf=$qfs2 v=$s2}"</str>
   
   <str name="2.fq">city:All OR _query_:"{!field f=city v=$c}"</str>
   <bool name="2.group">true</bool>
   <str name="2.group.field">mpId</str>
   <str name="2.group.limit">5</str>
   <bool name="2.group.main">true</bool>
   
   <bool name="2.lowercaseOperators">false</bool>
   <str name="2.qfs1">fsw_title^30 tsw_title^20 tf_title^15.0
keywords^1.0</str>
   <str name="2.qfs2">tsw_title^15.0 tf_title^10.0 tsw_keywords^1
keywords^0.5</str>
   <str name="2.pfx">fsw_title~1^100.0</str>
   <str name="2.pf2x">fsw_title~1^50.0</str>
   <str name="2.pf3x">fsw_title~1^25.0</str>
   <str name="2.bf">product(typeId,100)</str>






--
View this message in context: http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-ON-RELAVANCY-tp4090789p4090794.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SORTING RESULTS BASED ON RELAVANCY

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

The default sort is by relevancy. So, if you are getting it in the wrong
order, it think it is relevant in different ways. Depending on algorithm
you use, there are different boosting functions.

You may need to give more details. Algorithm, how would you know if
relevance sorting working, etc.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Wed, Sep 18, 2013 at 1:50 PM, PAVAN <pa...@gmail.com> wrote:

> Hi,
>
>        i am using fuzzy logic and it is giving exact results but i need to
> sort the results based on relavancy. Means closer match results comes
> first.
>
> anyone can help with this..
>
>
> Regards,
> Pavan.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-ON-RELAVANCY-tp4090789.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>