You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sivaprasad <si...@echidnainc.com> on 2010/11/19 13:36:08 UTC

Issue with relevancy

Hi,

I configured search request handler as shown below.

<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
</requestHandler>

I am submitting the below query for search.

http://localhost:8680/solr/catalogSearch/select?facet=true&spellcheck=true&indent=on&omitHeader=true&stats.field=sal_amt&stats.field=volume&stats=true&wt=xml&q=prod_n%3ADesktop+Computer+OR+manf_mdl_nbr%3ADesktop+Computer+OR+upc%3ADesktop+Computer+OR+Brands%3ADesktop+Computer&start=0&rows=60&fl=*,score&debugQuery=on

I am getting the below results ,But for the first doc the score is higher
than second doc, Even though the prod_n only has "Computers" word.

1)
<doc>
<float name="score">1.6884389</float>
<str name="Brands">GN Netcom</str>
<str name="category">Headsets & Microphones</str>
<str name="media_id">79634</str>
<str name="media_url">
http://image.shopzilla.com/resize?sq=60&uid=1640146921
</str>
<int name="prnt_ctgy_i">482</int>
<int name="prod_i">1640146921</int>
<str name="prod_n">Computer Headset</str>
<double name="sal_amt">17.95</double>
<str name="source_type">EXTERNAL</str>
</doc>

2)
<doc>
<float name="score">1.4326878</float>
<str name="category">Desktop Computers</str>
<str name="media_id">1565338</str>
<str name="media_url">
http://image.shopzilla.com/resize?sq=60&uid=1983384776
</str>
<int name="prnt_ctgy_i">461</int>
<int name="prod_i">1983384776</int>
<str name="prod_n">Rain Computers ION 6-Core DAW Computer</str>
<double name="sal_amt">1799.0</double>
<str name="source_type">EXTERNAL</str>
</doc>

I want to push down the first doc to second.H


The debug query analysis is given below.

<lst name="debug">

<str name="rawquerystring">
prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
Computer OR Brands:Desktop Computer
</str>

<str name="querystring">
prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
Computer OR Brands:Desktop Computer
</str>

<str name="parsedquery">
+prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
prod_n:comput Brands:Desktop +prod_n:comput
</str>

<str name="parsedquery_toString">
+prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
prod_n:comput Brands:Desktop +prod_n:comput
</str>

<lst name="explain">

<str name="1640146921">

1.6884388 = (MATCH) product of:
  2.701502 = (MATCH) sum of:
    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
        1.0 = tf(termFreq(prod_n:comput)=1)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.625 = fieldNorm(field=prod_n, doc=35844)
    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
        1.0 = tf(termFreq(prod_n:comput)=1)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.625 = fieldNorm(field=prod_n, doc=35844)
    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
        1.0 = tf(termFreq(prod_n:comput)=1)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.625 = fieldNorm(field=prod_n, doc=35844)
    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
        1.0 = tf(termFreq(prod_n:comput)=1)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.625 = fieldNorm(field=prod_n, doc=35844)
    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
        1.0 = tf(termFreq(prod_n:comput)=1)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.625 = fieldNorm(field=prod_n, doc=35844)
  0.625 = coord(5/8)
</str>

<str name="1983384776">

1.4326876 = (MATCH) product of:
  2.2923002 = (MATCH) sum of:
    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
        1.4142135 = tf(termFreq(prod_n:comput)=2)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.375 = fieldNorm(field=prod_n, doc=57069)
    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
        1.4142135 = tf(termFreq(prod_n:comput)=2)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.375 = fieldNorm(field=prod_n, doc=57069)
    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
        1.4142135 = tf(termFreq(prod_n:comput)=2)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.375 = fieldNorm(field=prod_n, doc=57069)
    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
        1.4142135 = tf(termFreq(prod_n:comput)=2)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.375 = fieldNorm(field=prod_n, doc=57069)
    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
      0.18838727 = queryWeight(prod_n:comput), product of:
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.041053277 = queryNorm
      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
        1.4142135 = tf(termFreq(prod_n:comput)=2)
        4.5888486 = idf(docFreq=3518, maxDocs=127361)
        0.375 = fieldNorm(field=prod_n, doc=57069)
  0.625 = coord(5/8)
</str>

One more question , How can i improve the relevancy of the docs.

If you want i will provide more information.

Thanks in advance.

Regards,
Siva
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-relevancy-tp1930292p1930292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with relevancy

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 19, 2010, at 7:36 AM, sivaprasad wrote:

> 
> Hi,
> 
> I configured search request handler as shown below.
> 
> <requestHandler name="standard" class="solr.SearchHandler" default="true">
>    <!-- default values for query parameters -->
>     <lst name="defaults">
>       <str name="echoParams">explicit</str>
> </requestHandler>
> 
> I am submitting the below query for search.
> 
> http://localhost:8680/solr/catalogSearch/select?facet=true&spellcheck=true&indent=on&omitHeader=true&stats.field=sal_amt&stats.field=volume&stats=true&wt=xml&q=prod_n%3ADesktop+Computer+OR+manf_mdl_nbr%3ADesktop+Computer+OR+upc%3ADesktop+Computer+OR+Brands%3ADesktop+Computer&start=0&rows=60&fl=*,score&debugQuery=on
> 
> I am getting the below results ,But for the first doc the score is higher
> than second doc, Even though the prod_n only has "Computers" word.

It looks like the first result is shorter, so the length norm is higher which is why it appears higher in the result list.  This value is captured in the fieldNorm in the explains below.

More below...


> 
> 1)
> <doc>
> <float name="score">1.6884389</float>
> <str name="Brands">GN Netcom</str>
> <str name="category">Headsets & Microphones</str>
> <str name="media_id">79634</str>
> <str name="media_url">
> http://image.shopzilla.com/resize?sq=60&uid=1640146921
> </str>
> <int name="prnt_ctgy_i">482</int>
> <int name="prod_i">1640146921</int>
> <str name="prod_n">Computer Headset</str>
> <double name="sal_amt">17.95</double>
> <str name="source_type">EXTERNAL</str>
> </doc>
> 
> 2)
> <doc>
> <float name="score">1.4326878</float>
> <str name="category">Desktop Computers</str>
> <str name="media_id">1565338</str>
> <str name="media_url">
> http://image.shopzilla.com/resize?sq=60&uid=1983384776
> </str>
> <int name="prnt_ctgy_i">461</int>
> <int name="prod_i">1983384776</int>
> <str name="prod_n">Rain Computers ION 6-Core DAW Computer</str>
> <double name="sal_amt">1799.0</double>
> <str name="source_type">EXTERNAL</str>
> </doc>
> 
> I want to push down the first doc to second.H
> 
> 
> The debug query analysis is given below.
> 
> <lst name="debug">
> 
> <str name="rawquerystring">
> prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
> Computer OR Brands:Desktop Computer
> </str>
> 
> <str name="querystring">
> prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
> Computer OR Brands:Desktop Computer
> </str>
> 
> <str name="parsedquery">
> +prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
> prod_n:comput Brands:Desktop +prod_n:comput
> </str>
> 
> <str name="parsedquery_toString">
> +prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
> prod_n:comput Brands:Desktop +prod_n:comput
> </str>
> 
> <lst name="explain">
> 
> <str name="1640146921">
> 
> 1.6884388 = (MATCH) product of:
>  2.701502 = (MATCH) sum of:
>    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>        1.0 = tf(termFreq(prod_n:comput)=1)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.625 = fieldNorm(field=prod_n, doc=35844)
>    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>        1.0 = tf(termFreq(prod_n:comput)=1)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.625 = fieldNorm(field=prod_n, doc=35844)
>    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>        1.0 = tf(termFreq(prod_n:comput)=1)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.625 = fieldNorm(field=prod_n, doc=35844)
>    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>        1.0 = tf(termFreq(prod_n:comput)=1)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.625 = fieldNorm(field=prod_n, doc=35844)
>    0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>        1.0 = tf(termFreq(prod_n:comput)=1)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.625 = fieldNorm(field=prod_n, doc=35844)
>  0.625 = coord(5/8)
> </str>
> 
> <str name="1983384776">
> 
> 1.4326876 = (MATCH) product of:
>  2.2923002 = (MATCH) sum of:
>    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>        1.4142135 = tf(termFreq(prod_n:comput)=2)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.375 = fieldNorm(field=prod_n, doc=57069)
>    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>        1.4142135 = tf(termFreq(prod_n:comput)=2)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.375 = fieldNorm(field=prod_n, doc=57069)
>    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>        1.4142135 = tf(termFreq(prod_n:comput)=2)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.375 = fieldNorm(field=prod_n, doc=57069)
>    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>        1.4142135 = tf(termFreq(prod_n:comput)=2)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.375 = fieldNorm(field=prod_n, doc=57069)
>    0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>      0.18838727 = queryWeight(prod_n:comput), product of:
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.041053277 = queryNorm
>      2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>        1.4142135 = tf(termFreq(prod_n:comput)=2)
>        4.5888486 = idf(docFreq=3518, maxDocs=127361)
>        0.375 = fieldNorm(field=prod_n, doc=57069)
>  0.625 = coord(5/8)
> </str>
> 
> One more question , How can i improve the relevancy of the docs.

I usually don't worry too much about 1 vs. 2, but if you are so inclined there are a few things that likely would help:

1.  Automatically induce phrase queries using the &pf option for the (extended) dismax Query Parser (assuming you are using it), otherwise create your own phrase queries for specific fields anytime you have multiple terms in the user query
2. Use the QueryElevationComponent to hardcode it.
3. Add your category field to your search list



--------------------------
Grant Ingersoll
http://www.lucidimagination.com


Re: Issue with relevancy

Posted by sivaprasad <si...@echidnainc.com>.
Thanks guys, I will try the mentioned options
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-relevancy-tp1930292p1943645.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with relevancy

Posted by Ahmet Arslan <io...@yahoo.com>.
> I am getting the below results ,But for the first doc the
> score is higher
> than second doc, Even though the prod_n only has
> "Computers" word.
> I want to push down the first doc to second.H

You can use Jan's magic solution -that uses map function- for that.
http://search-lucene.com/m/nK6t9j1fuc2/