You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Kuhlmann <ku...@solarier.de> on 2011/11/15 09:19:41 UTC

Re: two word phrase search using dismax

Am 14.11.2011 21:50, schrieb alxsss@aim.com:
> Hello,
>
> I use solr3.4 and nutch 1.3. In request handler we have
> <str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
>
> As fas as I know this means that for two word phrase search match must be 100%.
> However, I noticed that in most cases documents with both words are ranked around 20 place.
> In the first places are documents with one of the words in the phrase.
>
> Any ideas why this happening and is it possible to fix it?

Hi,

are you sure that only one of the words matched in the found documents? 
Have you checked all fields that are listed in the qf parameter? And did 
you check for stemmed versions of your search terms?

If all this is true, you maybe want to give an example.

And AFAIK the mm parameter does not affect the ranking.


Re: two word phrase search using dismax

Posted by Erick Erickson <er...@gmail.com>.
OK, why not just bump the boost on the site field way higher than you
already have?

A note of caution. You'll drive yourself crazy trying to get *exact*
ordering based
on some arbitrary (and usually changing) set of requirements. Put what you have
working in front of product management and see if it's "good enough" to let
you go on to other higher-value enhancements....

Best
Erick

On Mon, Dec 5, 2011 at 6:15 PM,  <al...@aim.com> wrote:
> Hi Eric,
>
> After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example,  for  phrase "newspaper latimes" latimes.com is not even in the results to boost it to the first place and changing mm param to   <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str> solves only 1,4 but 2,3.
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com>
> To: solr-user <so...@lucene.apache.org>
> Sent: Mon, Dec 5, 2011 5:52 am
> Subject: Re: two word phrase search using dismax
>
>
> Have you looked at the "pf" (phrase fields)
> parameter of edismax?
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29
>
> Best
> Erick
>
> On Sat, Dec 3, 2011 at 7:04 PM,  <al...@aim.com> wrote:
>> Hello,
>>
>> Here is my request handler
>>
>> <requestHandler name="search" class="solr.SearchHandler" >
>> <lst name="defaults">
>> <str name="defType">edismax</str>
>> <str name="echoParams">explicit</str>
>> <float name="tie">0.01</float>
>> <str name="qf">site^1.5 content^0.5 title^1.2</str>
>> <str name="pf">site^1.5 content^0.5 title^1.2</str>
>> <str name="fl">id,title, site</str>
>> <str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
>> <int name="ps">300</int>
>> <bool name="hl">true</bool>
>> <str name="q.alt">*:*</str>
>> <str name="hl.fl">content</str>
>> <str name="f.title.hl.fragsize">0</str>
>> <str name="hl.fragsize">165</str>
>> <str name="f.title.hl.alternateField">title</str>
>> <str name="f.url.hl.fragsize">0</str>
>> <str name="f.url.hl.alternateField">url</str>
>> <str name="f.content.hl.fragmenter">regex</str>
>> </lst>
>> </requestHandler>
>>
>> I have made a few tests with debugQuery and realised that for two word
> phrases, solr takes the first word and gives it a score according to qf param
> then takes the second word and gives it score and etc, but not to the whole
> phrase. That is why if one of the words is in the title and one of them in the
> content then this doc is given higher score than the one that has both words in
> the content but none in the title.
>>
>> Ideally, I want to achieve the following order.
>> 1. If one (or both) of the words are in field site, then it must be given
> higher score.
>> 2. Then come docs with both words in the title.
>> 3. Next, docs with both words in the content.
>> 4. And finally docs having either of words in the title and content.
>>
>> I tried to change mm param to <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str>
>> This allows to achieve 1,4 but not 2,3
>>
>> Thanks.
>> Alex.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Chris Hostetter <ho...@fucit.org>
>> To: solr-user <so...@lucene.apache.org>
>> Sent: Thu, Nov 17, 2011 2:17 pm
>> Subject: Re: two word phrase search using dismax
>>
>>
>>
>>
>> : After putting the same score for title and content in qf filed, docs
>>
>> : with both words in content moved to fifth place. The doc in the first,
>>
>> : third and fourth places still have only one of the words in content and
>>
>> : title. The doc in the second place has one of the words in title and
>>
>> : both words in the content but in different places not together.
>>
>>
>>
>> details matter -- if you send futher followup mails the full details of
>>
>> your dismax options and the score explanations for debugQuery are
>>
>> neccessary to be sure people understand what you are describing (a
>>
>> snapshot of reality is far more valuable then a vague description of
>>
>> reality)
>>
>>
>>
>> off hand what you are describing sounds correct -- this is what the
>>
>> dismax parser is really designed to do.
>>
>>
>>
>> even if you have given both title and content equal boosts, your title
>>
>> field is probably shorter then your content field, so words matching once
>>
>> in title are likly to score higher then the same word matching once in
>>
>> content due to length normalization -- and unless you set the "tie" param
>>
>> to something really high, the score contribution from the highest scoring
>>
>> field (in this case title) will be the dominant factor in the score (it's
>>
>> disjunction *max* by default ... if you make tie=1 then it's disjunction
>>
>> *sum*)
>>
>>
>>
>> you haven't mentioned anything about hte "pf" param at all which i can
>>
>> only assume means you aren't using it -- the pf param is how you configure
>>
>> that scores should be increased if/when all of the words in teh query
>>
>> string appear together.  I would suggest putting all of the fields in your
>>
>> "qf" param in your "pf" param as well.
>>
>>
>>
>>
>>
>> -Hoss
>>
>>
>>
>
>
>

Re: two word phrase search using dismax

Posted by al...@aim.com.
Hi Eric, 

After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example,  for  phrase "newspaper latimes" latimes.com is not even in the results to boost it to the first place and changing mm param to   <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str> solves only 1,4 but 2,3.

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Erick Erickson <er...@gmail.com>
To: solr-user <so...@lucene.apache.org>
Sent: Mon, Dec 5, 2011 5:52 am
Subject: Re: two word phrase search using dismax


Have you looked at the "pf" (phrase fields)
parameter of edismax?

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

Best
Erick

On Sat, Dec 3, 2011 at 7:04 PM,  <al...@aim.com> wrote:
> Hello,
>
> Here is my request handler
>
> <requestHandler name="search" class="solr.SearchHandler" >
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="echoParams">explicit</str>
> <float name="tie">0.01</float>
> <str name="qf">site^1.5 content^0.5 title^1.2</str>
> <str name="pf">site^1.5 content^0.5 title^1.2</str>
> <str name="fl">id,title, site</str>
> <str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
> <int name="ps">300</int>
> <bool name="hl">true</bool>
> <str name="q.alt">*:*</str>
> <str name="hl.fl">content</str>
> <str name="f.title.hl.fragsize">0</str>
> <str name="hl.fragsize">165</str>
> <str name="f.title.hl.alternateField">title</str>
> <str name="f.url.hl.fragsize">0</str>
> <str name="f.url.hl.alternateField">url</str>
> <str name="f.content.hl.fragmenter">regex</str>
> </lst>
> </requestHandler>
>
> I have made a few tests with debugQuery and realised that for two word 
phrases, solr takes the first word and gives it a score according to qf param 
then takes the second word and gives it score and etc, but not to the whole 
phrase. That is why if one of the words is in the title and one of them in the 
content then this doc is given higher score than the one that has both words in 
the content but none in the title.
>
> Ideally, I want to achieve the following order.
> 1. If one (or both) of the words are in field site, then it must be given 
higher score.
> 2. Then come docs with both words in the title.
> 3. Next, docs with both words in the content.
> 4. And finally docs having either of words in the title and content.
>
> I tried to change mm param to <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str>
> This allows to achieve 1,4 but not 2,3
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Chris Hostetter <ho...@fucit.org>
> To: solr-user <so...@lucene.apache.org>
> Sent: Thu, Nov 17, 2011 2:17 pm
> Subject: Re: two word phrase search using dismax
>
>
>
>
> : After putting the same score for title and content in qf filed, docs
>
> : with both words in content moved to fifth place. The doc in the first,
>
> : third and fourth places still have only one of the words in content and
>
> : title. The doc in the second place has one of the words in title and
>
> : both words in the content but in different places not together.
>
>
>
> details matter -- if you send futher followup mails the full details of
>
> your dismax options and the score explanations for debugQuery are
>
> neccessary to be sure people understand what you are describing (a
>
> snapshot of reality is far more valuable then a vague description of
>
> reality)
>
>
>
> off hand what you are describing sounds correct -- this is what the
>
> dismax parser is really designed to do.
>
>
>
> even if you have given both title and content equal boosts, your title
>
> field is probably shorter then your content field, so words matching once
>
> in title are likly to score higher then the same word matching once in
>
> content due to length normalization -- and unless you set the "tie" param
>
> to something really high, the score contribution from the highest scoring
>
> field (in this case title) will be the dominant factor in the score (it's
>
> disjunction *max* by default ... if you make tie=1 then it's disjunction
>
> *sum*)
>
>
>
> you haven't mentioned anything about hte "pf" param at all which i can
>
> only assume means you aren't using it -- the pf param is how you configure
>
> that scores should be increased if/when all of the words in teh query
>
> string appear together.  I would suggest putting all of the fields in your
>
> "qf" param in your "pf" param as well.
>
>
>
>
>
> -Hoss
>
>
>

 
 

Re: two word phrase search using dismax

Posted by Erick Erickson <er...@gmail.com>.
Have you looked at the "pf" (phrase fields)
parameter of edismax?

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

Best
Erick

On Sat, Dec 3, 2011 at 7:04 PM,  <al...@aim.com> wrote:
> Hello,
>
> Here is my request handler
>
> <requestHandler name="search" class="solr.SearchHandler" >
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="echoParams">explicit</str>
> <float name="tie">0.01</float>
> <str name="qf">site^1.5 content^0.5 title^1.2</str>
> <str name="pf">site^1.5 content^0.5 title^1.2</str>
> <str name="fl">id,title, site</str>
> <str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
> <int name="ps">300</int>
> <bool name="hl">true</bool>
> <str name="q.alt">*:*</str>
> <str name="hl.fl">content</str>
> <str name="f.title.hl.fragsize">0</str>
> <str name="hl.fragsize">165</str>
> <str name="f.title.hl.alternateField">title</str>
> <str name="f.url.hl.fragsize">0</str>
> <str name="f.url.hl.alternateField">url</str>
> <str name="f.content.hl.fragmenter">regex</str>
> </lst>
> </requestHandler>
>
> I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title.
>
> Ideally, I want to achieve the following order.
> 1. If one (or both) of the words are in field site, then it must be given higher score.
> 2. Then come docs with both words in the title.
> 3. Next, docs with both words in the content.
> 4. And finally docs having either of words in the title and content.
>
> I tried to change mm param to <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str>
> This allows to achieve 1,4 but not 2,3
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Chris Hostetter <ho...@fucit.org>
> To: solr-user <so...@lucene.apache.org>
> Sent: Thu, Nov 17, 2011 2:17 pm
> Subject: Re: two word phrase search using dismax
>
>
>
>
> : After putting the same score for title and content in qf filed, docs
>
> : with both words in content moved to fifth place. The doc in the first,
>
> : third and fourth places still have only one of the words in content and
>
> : title. The doc in the second place has one of the words in title and
>
> : both words in the content but in different places not together.
>
>
>
> details matter -- if you send futher followup mails the full details of
>
> your dismax options and the score explanations for debugQuery are
>
> neccessary to be sure people understand what you are describing (a
>
> snapshot of reality is far more valuable then a vague description of
>
> reality)
>
>
>
> off hand what you are describing sounds correct -- this is what the
>
> dismax parser is really designed to do.
>
>
>
> even if you have given both title and content equal boosts, your title
>
> field is probably shorter then your content field, so words matching once
>
> in title are likly to score higher then the same word matching once in
>
> content due to length normalization -- and unless you set the "tie" param
>
> to something really high, the score contribution from the highest scoring
>
> field (in this case title) will be the dominant factor in the score (it's
>
> disjunction *max* by default ... if you make tie=1 then it's disjunction
>
> *sum*)
>
>
>
> you haven't mentioned anything about hte "pf" param at all which i can
>
> only assume means you aren't using it -- the pf param is how you configure
>
> that scores should be increased if/when all of the words in teh query
>
> string appear together.  I would suggest putting all of the fields in your
>
> "qf" param in your "pf" param as well.
>
>
>
>
>
> -Hoss
>
>
>

Re: two word phrase search using dismax

Posted by al...@aim.com.
Hello,

Here is my request handler

<requestHandler name="search" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">site^1.5 content^0.5 title^1.2</str>
<str name="pf">site^1.5 content^0.5 title^1.2</str>
<str name="fl">id,title, site</str>
<str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
<int name="ps">300</int>
<bool name="hl">true</bool>
<str name="q.alt">*:*</str>
<str name="hl.fl">content</str>
<str name="f.title.hl.fragsize">0</str>
<str name="hl.fragsize">165</str>
<str name="f.title.hl.alternateField">title</str>
<str name="f.url.hl.fragsize">0</str>
<str name="f.url.hl.alternateField">url</str>
<str name="f.content.hl.fragmenter">regex</str>
</lst>
</requestHandler>

I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title.

Ideally, I want to achieve the following order.
1. If one (or both) of the words are in field site, then it must be given higher score.
2. Then come docs with both words in the title.
3. Next, docs with both words in the content.
4. And finally docs having either of words in the title and content.

I tried to change mm param to <str name="mm">1&lt;-1 5&lt;-2 6&lt;90%</str>
This allows to achieve 1,4 but not 2,3

Thanks.
Alex.






 

 

 

-----Original Message-----
From: Chris Hostetter <ho...@fucit.org>
To: solr-user <so...@lucene.apache.org>
Sent: Thu, Nov 17, 2011 2:17 pm
Subject: Re: two word phrase search using dismax




: After putting the same score for title and content in qf filed, docs 

: with both words in content moved to fifth place. The doc in the first, 

: third and fourth places still have only one of the words in content and 

: title. The doc in the second place has one of the words in title and 

: both words in the content but in different places not together.



details matter -- if you send futher followup mails the full details of 

your dismax options and the score explanations for debugQuery are 

neccessary to be sure people understand what you are describing (a 

snapshot of reality is far more valuable then a vague description of 

reality)



off hand what you are describing sounds correct -- this is what the 

dismax parser is really designed to do.



even if you have given both title and content equal boosts, your title 

field is probably shorter then your content field, so words matching once 

in title are likly to score higher then the same word matching once in 

content due to length normalization -- and unless you set the "tie" param 

to something really high, the score contribution from the highest scoring 

field (in this case title) will be the dominant factor in the score (it's 

disjunction *max* by default ... if you make tie=1 then it's disjunction 

*sum*)



you haven't mentioned anything about hte "pf" param at all which i can 

only assume means you aren't using it -- the pf param is how you configure 

that scores should be increased if/when all of the words in teh query 

string appear together.  I would suggest putting all of the fields in your 

"qf" param in your "pf" param as well.





-Hoss


 

Re: two word phrase search using dismax

Posted by Chris Hostetter <ho...@fucit.org>.
: After putting the same score for title and content in qf filed, docs 
: with both words in content moved to fifth place. The doc in the first, 
: third and fourth places still have only one of the words in content and 
: title. The doc in the second place has one of the words in title and 
: both words in the content but in different places not together.

details matter -- if you send futher followup mails the full details of 
your dismax options and the score explanations for debugQuery are 
neccessary to be sure people understand what you are describing (a 
snapshot of reality is far more valuable then a vague description of 
reality)

off hand what you are describing sounds correct -- this is what the 
dismax parser is really designed to do.

even if you have given both title and content equal boosts, your title 
field is probably shorter then your content field, so words matching once 
in title are likly to score higher then the same word matching once in 
content due to length normalization -- and unless you set the "tie" param 
to something really high, the score contribution from the highest scoring 
field (in this case title) will be the dominant factor in the score (it's 
disjunction *max* by default ... if you make tie=1 then it's disjunction 
*sum*)

you haven't mentioned anything about hte "pf" param at all which i can 
only assume means you aren't using it -- the pf param is how you configure 
that scores should be increased if/when all of the words in teh query 
string appear together.  I would suggest putting all of the fields in your 
"qf" param in your "pf" param as well.


-Hoss

Re: two word phrase search using dismax

Posted by al...@aim.com.
Hello,

Thanks for your letter. I investigated further and found out that we have title scored more than content in qf field and those docs in the first places have one of the words in title but not both of them.
The doc in the first place has only one of the words in the content.
Docs with both words in content are placed after them in around 20th place.

After putting the same score for title and content in qf filed,  docs with both words in content moved to fifth place. The doc in the first, third and fourth places still have only one of the words in content and title.
The doc in the second place has one of the words in title and both words in the content but in different places not together.

Thanks.
Alex.
 

-----Original Message-----
From: Michael Kuhlmann <ku...@solarier.de>
To: solr-user <so...@lucene.apache.org>
Sent: Tue, Nov 15, 2011 12:20 am
Subject: Re: two word phrase search using dismax


Am 14.11.2011 21:50, schrieb alxsss@aim.com:
> Hello,
>
> I use solr3.4 and nutch 1.3. In request handler we have
> <str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
>
> As fas as I know this means that for two word phrase search match must be 
100%.
> However, I noticed that in most cases documents with both words are ranked 
around 20 place.
> In the first places are documents with one of the words in the phrase.
>
> Any ideas why this happening and is it possible to fix it?

Hi,

are you sure that only one of the words matched in the found documents? 
Have you checked all fields that are listed in the qf parameter? And did 
you check for stemmed versions of your search terms?

If all this is true, you maybe want to give an example.

And AFAIK the mm parameter does not affect the ranking.