You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ravi Solr <ra...@gmail.com> on 2014/05/05 23:41:58 UTC

Relevancy help

Hello,
        I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar

Re: Relevancy help

Posted by Jack Krupansky <ja...@basetechnology.com>.
The recip function query is the proper way to boost by reverse chronological 
order, but you may have to play around with the boost factor so that date 
does not completely overwhelm the natural relevancy.

Use the debugQuery=true parameter and look at the "explain" section to see 
what the document scores look like.

-- Jack Krupansky

-----Original Message----- 
From: Ravi Solr
Sent: Monday, May 5, 2014 5:41 PM
To: solr-user@lucene.apache.org
Subject: Relevancy help

Hello,
        I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar 


Re: Relevancy help

Posted by Ahmet Arslan <io...@yahoo.com>.

Hi,

if you can create a function query, that will assign a constant score of lets say 100 , then you can sort multi criteria,  sort= score desc, recency_date desc


On Tuesday, May 6, 2014 5:51 PM, Ravi Solr <ra...@gmail.com> wrote:
Thank you very much for your responses.

Jack, even if I were to tweak the boost factor it might not work in all
cases. So I was looking at a more generic way via Function Queries to
achieve my goal.

Ahmet, I did see Jan Høydahl's response on all terms boosting as follows-
q=a
fox&defType=dismax&qf=allfields&bf=map(query($qq),0,0,0,100.0)&qq=allfields:(quick
AND brown AND fence)
This is what Iam looking for however instead of a constant boost I am
thinking the '100.0' could be replaced with some mathematical function
between score and publish date. I ran into trouble as score cannot be used
directly in a function query. Is query(x) the right way to get score ???

Alexandre I couldn't find any documentation on QueryRescore API...if you
know of any can you kindly point it out.



Ravi Kiran Bhaskar



On Tue, May 6, 2014 at 12:03 AM, Alexandre Rafalovitch
<ar...@gmail.com>wrote:

> Can you sort by score, than date? Assuming similar articles will get
> same score (may need to discount frequency/length).
>
> There is also QueryRescore API introduced in Lucene 4.8 that might be
> relevant. Though I have no idea how that would get exposed in Solr.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> > Hi Ravi,
> >
> > Regarding recency please see :
> http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
> >
> > Regarding "docs containing all words" there is function query that
> elevates those docs to top. Search existing mailing list past posts.
> >
> > Ahmet
> >
> >
> > On Tuesday, May 6, 2014 12:42 AM, Ravi Solr <ra...@gmail.com> wrote:
> >
> > Hello,
> >         I have a weird relevancy requirement. We search news content
> hence
> > chronology is very important and also relevancy, although both are
> mutually
> > exclusive. For example, if the search terms are -  malaysia airline crash
> > blackbox - my requirements are as follows
> >
> > docs containing all words should be on top, but the editorial also wants
> > them sorted reverse by chronological order without loosing relevancy. Why
> > ?? If on day 1 there is an article about search for blackbox but on day 2
> > the blackbox is found and day 3 there is an article about blackbox being
> > unusable...from the user's standpoint it makes sense that we show most
> > recent content on top.
> >
> > I already boost recency of docs with
> > boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments
> of
> > 3 months
> >
> > However when I do the boost the chronology is messed up. I know relevancy
> > and sorting are mutually exclusive concepts. Is there any magic that we
> can
> > do in SOLR which can achieve both ???
> >
> >
> > Thanks,
> >
> > Ravi Kiran bhaskar
>

Re: Relevancy help

Posted by Ravi Solr <ra...@gmail.com>.
Thank you very much for your responses.

Jack, even if I were to tweak the boost factor it might not work in all
cases. So I was looking at a more generic way via Function Queries to
achieve my goal.

Ahmet, I did see Jan Høydahl's response on all terms boosting as follows-
 q=a
fox&defType=dismax&qf=allfields&bf=map(query($qq),0,0,0,100.0)&qq=allfields:(quick
AND brown AND fence)
This is what Iam looking for however instead of a constant boost I am
thinking the '100.0' could be replaced with some mathematical function
between score and publish date. I ran into trouble as score cannot be used
directly in a function query. Is query(x) the right way to get score ???

Alexandre I couldn't find any documentation on QueryRescore API...if you
know of any can you kindly point it out.



Ravi Kiran Bhaskar


On Tue, May 6, 2014 at 12:03 AM, Alexandre Rafalovitch
<ar...@gmail.com>wrote:

> Can you sort by score, than date? Assuming similar articles will get
> same score (may need to discount frequency/length).
>
> There is also QueryRescore API introduced in Lucene 4.8 that might be
> relevant. Though I have no idea how that would get exposed in Solr.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> > Hi Ravi,
> >
> > Regarding recency please see :
> http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
> >
> > Regarding "docs containing all words" there is function query that
> elevates those docs to top. Search existing mailing list past posts.
> >
> > Ahmet
> >
> >
> > On Tuesday, May 6, 2014 12:42 AM, Ravi Solr <ra...@gmail.com> wrote:
> >
> > Hello,
> >         I have a weird relevancy requirement. We search news content
> hence
> > chronology is very important and also relevancy, although both are
> mutually
> > exclusive. For example, if the search terms are -  malaysia airline crash
> > blackbox - my requirements are as follows
> >
> > docs containing all words should be on top, but the editorial also wants
> > them sorted reverse by chronological order without loosing relevancy. Why
> > ?? If on day 1 there is an article about search for blackbox but on day 2
> > the blackbox is found and day 3 there is an article about blackbox being
> > unusable...from the user's standpoint it makes sense that we show most
> > recent content on top.
> >
> > I already boost recency of docs with
> > boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments
> of
> > 3 months
> >
> > However when I do the boost the chronology is messed up. I know relevancy
> > and sorting are mutually exclusive concepts. Is there any magic that we
> can
> > do in SOLR which can achieve both ???
> >
> >
> > Thanks,
> >
> > Ravi Kiran bhaskar
>

Re: Relevancy help

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Can you sort by score, than date? Assuming similar articles will get
same score (may need to discount frequency/length).

There is also QueryRescore API introduced in Lucene 4.8 that might be
relevant. Though I have no idea how that would get exposed in Solr.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> Hi Ravi,
>
> Regarding recency please see : http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
>
> Regarding "docs containing all words" there is function query that elevates those docs to top. Search existing mailing list past posts.
>
> Ahmet
>
>
> On Tuesday, May 6, 2014 12:42 AM, Ravi Solr <ra...@gmail.com> wrote:
>
> Hello,
>         I have a weird relevancy requirement. We search news content hence
> chronology is very important and also relevancy, although both are mutually
> exclusive. For example, if the search terms are -  malaysia airline crash
> blackbox - my requirements are as follows
>
> docs containing all words should be on top, but the editorial also wants
> them sorted reverse by chronological order without loosing relevancy. Why
> ?? If on day 1 there is an article about search for blackbox but on day 2
> the blackbox is found and day 3 there is an article about blackbox being
> unusable...from the user's standpoint it makes sense that we show most
> recent content on top.
>
> I already boost recency of docs with
> boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
> 3 months
>
> However when I do the boost the chronology is messed up. I know relevancy
> and sorting are mutually exclusive concepts. Is there any magic that we can
> do in SOLR which can achieve both ???
>
>
> Thanks,
>
> Ravi Kiran bhaskar

Re: Relevancy help

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Ravi,

Regarding recency please see : http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr

Regarding "docs containing all words" there is function query that elevates those docs to top. Search existing mailing list past posts.

Ahmet


On Tuesday, May 6, 2014 12:42 AM, Ravi Solr <ra...@gmail.com> wrote:

Hello,
        I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar