You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexandre Rocco <al...@gmail.com> on 2012/01/11 14:29:20 UTC

Relevancy and random sorting

Hello all,

Recently i've been trying to tweak some aspects of relevancy in one listing
project.
I need to give a higher score to newer documents and also boost the
document based on a boolean field that indicates the listing has pictures.
On top of that, in some situations we need a random sorting for the records
but also preserving the ranking.

I tried to combine some techniques described in the Solr Relevancy FAQ
wiki, but when I add the random sorting, the ranking gets messy (as
expected).

This works well:
http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score

This does not work, gives a random order on what is already ranked
http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc

The only way I see is to create another field on the schema containing a
random value and use it to boost the document the same way that was tone on
the boolean field.
Anyone tried something like this before and knows some way to get it
working?

Thanks,
Alexandre

Re: Relevancy and random sorting

Posted by Chris Hostetter <ho...@fucit.org>.
: We have a listing aggregator that gets product listings from a lot of
: different sites and since they are added in batches, sometimes you see a
: lot of pages from the same source (site). We are working on some changes to
: shift things around and reduce this "blocking" effect, so we can present
: mixed sources on the result pages.

if the problem you are seeing is strings of docs all i na clump because 
they have the same *score* then just add a secondary sort on your random 
field - in the example you posted, you completley replace the sort by 
score with sort by random...

	sort = score desc, random_1 desc

but that will only help differentiate when the scores are identical.

alternatively: you could probably use a random field in your baising 
function, although you should probably use something like the "map" or 
"scale" functions to keep it from having too much of a profound impact on 
the final score.

maybe something like...

q={!boost b=product(scale(random_1,1,5),recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1))}
  active:true AND featured:false +_val_:haspicture

-Hoss

Re: Relevancy and random sorting

Posted by Alexandre Rocco <al...@gmail.com>.
Michael,

We are using the random sorting in combination with date and other fields
but I am trying to change this to affect the ranking instead of sorting
directly.
That way we can also use other useful tweaks on the rank itself.

Alexandre

On Thu, Jan 12, 2012 at 11:46 AM, Michael Kuhlmann <ku...@solarier.de> wrote:

> Does the random sort function help you here?
>
> http://lucene.apache.org/solr/**api/org/apache/solr/schema/**
> RandomSortField.html<http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html>
>
> However, you will get some very old listings then, if it's okay for you.
>
> -Kuli
>
> Am 12.01.2012 14:38, schrieb Alexandre Rocco:
>
>  Erick,
>>
>> This document already has a field that indicates the source (site).
>> The issue we are trying to solve is when we list all documents without any
>> specific criteria. Since we bring the most recent ones and the ones that
>> contains images, we end up having a lot of listings from a single site,
>> since the documents are indexed in batches from the same site. At some
>> point we have several documents from the same site in the same date/time
>> and having images. I'm trying to give some random aspect to this search so
>> other documents can also appear in between that big dataset from the same
>> source.
>> Does the grouping help to achieve this?
>>
>> Alexandre
>>
>> On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson<erickerickson@gmail.**
>> com <er...@gmail.com>>wrote:
>>
>>  Alexandre:
>>>
>>> Have you thought about grouping? If you can analyze the incoming
>>> documents and include a field such that "similar" documents map
>>> to the same value, than group on that value you'll get output that
>>> isn't dominated by repeated copies of the "similar" documents. It
>>> depends, though, on being able to do a suitable mapping.
>>>
>>> In your case, could the mapping just be the site from which you
>>> got the data?
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco<al...@gmail.com>
>>> wrote:
>>>
>>>> Erick,
>>>>
>>>> Probably I really written something silly. You are right on either
>>>>
>>> sorting
>>>
>>>> by field or ranking.
>>>> I just need to change the ranking to shift things around as you said.
>>>>
>>>> To clarify the use case:
>>>> We have a listing aggregator that gets product listings from a lot of
>>>> different sites and since they are added in batches, sometimes you see a
>>>> lot of pages from the same source (site). We are working on some changes
>>>>
>>> to
>>>
>>>> shift things around and reduce this "blocking" effect, so we can present
>>>> mixed sources on the result pages.
>>>>
>>>> I guess I will start with the document random field and later try to
>>>> develop a custom plugin to make things better.
>>>>
>>>> Thanks for the pointers.
>>>>
>>>> Regards,
>>>> Alexandre
>>>>
>>>> On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson<erickerickson@gmail.**
>>>> com <er...@gmail.com>
>>>> wrote:
>>>>
>>>>  I really don't understand what this means:
>>>>> "random sorting for the records but also preserving the ranking"
>>>>>
>>>>> Either you're sorting on rank or you're not. If you mean you're
>>>>> trying to shift things around just a little bit, *mostly* respecting
>>>>> relevance then I guess you can do what you're thinking.
>>>>>
>>>>> You could create your own function query to do the boosting, see:
>>>>> http://wiki.apache.org/solr/**SolrPlugins#ValueSourceParser<http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser>
>>>>>
>>>>> which would keep you from having to re-index your data to get
>>>>> a different "randomness".
>>>>>
>>>>> You could also consider external file fields, but I think your
>>>>> own function query would be cleaner. I don't think math.random
>>>>> is a supported function OOB
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>>
>>>>> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco<al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Recently i've been trying to tweak some aspects of relevancy in one
>>>>>>
>>>>> listing
>>>>>
>>>>>> project.
>>>>>> I need to give a higher score to newer documents and also boost the
>>>>>> document based on a boolean field that indicates the listing has
>>>>>>
>>>>> pictures.
>>>>>
>>>>>> On top of that, in some situations we need a random sorting for the
>>>>>>
>>>>> records
>>>>>
>>>>>> but also preserving the ranking.
>>>>>>
>>>>>> I tried to combine some techniques described in the Solr Relevancy FAQ
>>>>>> wiki, but when I add the random sorting, the ranking gets messy (as
>>>>>> expected).
>>>>>>
>>>>>> This works well:
>>>>>>
>>>>>>
>>>>>  http://localhost:18979/solr/**select/?start=0&rows=15&q={!**
>>> boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
>>> active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
>>> haspicture%22&fl=*,score
>>>
>>>>
>>>>>> This does not work, gives a random order on what is already ranked
>>>>>>
>>>>>>
>>>>>  http://localhost:18979/solr/**select/?start=0&rows=15&q={!**
>>> boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
>>> active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
>>> haspicture%22&fl=*,score&sort=**random_1+desc
>>>
>>>>
>>>>>> The only way I see is to create another field on the schema
>>>>>>
>>>>> containing a
>>>
>>>> random value and use it to boost the document the same way that was
>>>>>>
>>>>> tone
>>>
>>>> on
>>>>>
>>>>>> the boolean field.
>>>>>> Anyone tried something like this before and knows some way to get it
>>>>>> working?
>>>>>>
>>>>>> Thanks,
>>>>>> Alexandre
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Relevancy and random sorting

Posted by Michael Kuhlmann <ku...@solarier.de>.
Does the random sort function help you here?

http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

However, you will get some very old listings then, if it's okay for you.

-Kuli

Am 12.01.2012 14:38, schrieb Alexandre Rocco:
> Erick,
>
> This document already has a field that indicates the source (site).
> The issue we are trying to solve is when we list all documents without any
> specific criteria. Since we bring the most recent ones and the ones that
> contains images, we end up having a lot of listings from a single site,
> since the documents are indexed in batches from the same site. At some
> point we have several documents from the same site in the same date/time
> and having images. I'm trying to give some random aspect to this search so
> other documents can also appear in between that big dataset from the same
> source.
> Does the grouping help to achieve this?
>
> Alexandre
>
> On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson<er...@gmail.com>wrote:
>
>> Alexandre:
>>
>> Have you thought about grouping? If you can analyze the incoming
>> documents and include a field such that "similar" documents map
>> to the same value, than group on that value you'll get output that
>> isn't dominated by repeated copies of the "similar" documents. It
>> depends, though, on being able to do a suitable mapping.
>>
>> In your case, could the mapping just be the site from which you
>> got the data?
>>
>> Best
>> Erick
>>
>> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco<al...@gmail.com>
>> wrote:
>>> Erick,
>>>
>>> Probably I really written something silly. You are right on either
>> sorting
>>> by field or ranking.
>>> I just need to change the ranking to shift things around as you said.
>>>
>>> To clarify the use case:
>>> We have a listing aggregator that gets product listings from a lot of
>>> different sites and since they are added in batches, sometimes you see a
>>> lot of pages from the same source (site). We are working on some changes
>> to
>>> shift things around and reduce this "blocking" effect, so we can present
>>> mixed sources on the result pages.
>>>
>>> I guess I will start with the document random field and later try to
>>> develop a custom plugin to make things better.
>>>
>>> Thanks for the pointers.
>>>
>>> Regards,
>>> Alexandre
>>>
>>> On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson<erickerickson@gmail.com
>>> wrote:
>>>
>>>> I really don't understand what this means:
>>>> "random sorting for the records but also preserving the ranking"
>>>>
>>>> Either you're sorting on rank or you're not. If you mean you're
>>>> trying to shift things around just a little bit, *mostly* respecting
>>>> relevance then I guess you can do what you're thinking.
>>>>
>>>> You could create your own function query to do the boosting, see:
>>>> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
>>>>
>>>> which would keep you from having to re-index your data to get
>>>> a different "randomness".
>>>>
>>>> You could also consider external file fields, but I think your
>>>> own function query would be cleaner. I don't think math.random
>>>> is a supported function OOB
>>>>
>>>> Best
>>>> Erick
>>>>
>>>>
>>>> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco<al...@gmail.com>
>>>> wrote:
>>>>> Hello all,
>>>>>
>>>>> Recently i've been trying to tweak some aspects of relevancy in one
>>>> listing
>>>>> project.
>>>>> I need to give a higher score to newer documents and also boost the
>>>>> document based on a boolean field that indicates the listing has
>>>> pictures.
>>>>> On top of that, in some situations we need a random sorting for the
>>>> records
>>>>> but also preserving the ranking.
>>>>>
>>>>> I tried to combine some techniques described in the Solr Relevancy FAQ
>>>>> wiki, but when I add the random sorting, the ranking gets messy (as
>>>>> expected).
>>>>>
>>>>> This works well:
>>>>>
>>>>
>> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
>>>>>
>>>>> This does not work, gives a random order on what is already ranked
>>>>>
>>>>
>> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
>>>>>
>>>>> The only way I see is to create another field on the schema
>> containing a
>>>>> random value and use it to boost the document the same way that was
>> tone
>>>> on
>>>>> the boolean field.
>>>>> Anyone tried something like this before and knows some way to get it
>>>>> working?
>>>>>
>>>>> Thanks,
>>>>> Alexandre
>>>>
>>
>


Re: Relevancy and random sorting

Posted by Ahmet Arslan <io...@yahoo.com>.
> This document already has a field that indicates the source
> (site).
> The issue we are trying to solve is when we list all
> documents without any
> specific criteria. Since we bring the most recent ones and
> the ones that
> contains images, we end up having a lot of listings from a
> single site,
> since the documents are indexed in batches from the same
> site. At some
> point we have several documents from the same site in the
> same date/time
> and having images. I'm trying to give some random aspect to
> this search so
> other documents can also appear in between that big dataset
> from the same
> source.
> Does the grouping help to achieve this?

Yes, http://wiki.apache.org/solr/FieldCollapsing
You will display only 3 documents at most from a single site. You will put a link saying that, there are xxx more documents from site yyy, click here to see all of them.

Re: Relevancy and random sorting

Posted by Alexandre Rocco <al...@gmail.com>.
Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson <er...@gmail.com>wrote:

> Alexandre:
>
> Have you thought about grouping? If you can analyze the incoming
> documents and include a field such that "similar" documents map
> to the same value, than group on that value you'll get output that
> isn't dominated by repeated copies of the "similar" documents. It
> depends, though, on being able to do a suitable mapping.
>
> In your case, could the mapping just be the site from which you
> got the data?
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco <al...@gmail.com>
> wrote:
> > Erick,
> >
> > Probably I really written something silly. You are right on either
> sorting
> > by field or ranking.
> > I just need to change the ranking to shift things around as you said.
> >
> > To clarify the use case:
> > We have a listing aggregator that gets product listings from a lot of
> > different sites and since they are added in batches, sometimes you see a
> > lot of pages from the same source (site). We are working on some changes
> to
> > shift things around and reduce this "blocking" effect, so we can present
> > mixed sources on the result pages.
> >
> > I guess I will start with the document random field and later try to
> > develop a custom plugin to make things better.
> >
> > Thanks for the pointers.
> >
> > Regards,
> > Alexandre
> >
> > On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> I really don't understand what this means:
> >> "random sorting for the records but also preserving the ranking"
> >>
> >> Either you're sorting on rank or you're not. If you mean you're
> >> trying to shift things around just a little bit, *mostly* respecting
> >> relevance then I guess you can do what you're thinking.
> >>
> >> You could create your own function query to do the boosting, see:
> >> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
> >>
> >> which would keep you from having to re-index your data to get
> >> a different "randomness".
> >>
> >> You could also consider external file fields, but I think your
> >> own function query would be cleaner. I don't think math.random
> >> is a supported function OOB
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <al...@gmail.com>
> >> wrote:
> >> > Hello all,
> >> >
> >> > Recently i've been trying to tweak some aspects of relevancy in one
> >> listing
> >> > project.
> >> > I need to give a higher score to newer documents and also boost the
> >> > document based on a boolean field that indicates the listing has
> >> pictures.
> >> > On top of that, in some situations we need a random sorting for the
> >> records
> >> > but also preserving the ranking.
> >> >
> >> > I tried to combine some techniques described in the Solr Relevancy FAQ
> >> > wiki, but when I add the random sorting, the ranking gets messy (as
> >> > expected).
> >> >
> >> > This works well:
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
> >> >
> >> > This does not work, gives a random order on what is already ranked
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
> >> >
> >> > The only way I see is to create another field on the schema
> containing a
> >> > random value and use it to boost the document the same way that was
> tone
> >> on
> >> > the boolean field.
> >> > Anyone tried something like this before and knows some way to get it
> >> > working?
> >> >
> >> > Thanks,
> >> > Alexandre
> >>
>

Re: Relevancy and random sorting

Posted by Ted Dunning <te...@gmail.com>.
I think the OP meant to use random order in the case of score ties.

On Wed, Jan 11, 2012 at 9:31 PM, Erick Erickson <er...@gmail.com>wrote:

> Alexandre:
>
> Have you thought about grouping? If you can analyze the incoming
> documents and include a field such that "similar" documents map
> to the same value, than group on that value you'll get output that
> isn't dominated by repeated copies of the "similar" documents. It
> depends, though, on being able to do a suitable mapping.
>
> In your case, could the mapping just be the site from which you
> got the data?
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco <al...@gmail.com>
> wrote:
> > Erick,
> >
> > Probably I really written something silly. You are right on either
> sorting
> > by field or ranking.
> > I just need to change the ranking to shift things around as you said.
> >
> > To clarify the use case:
> > We have a listing aggregator that gets product listings from a lot of
> > different sites and since they are added in batches, sometimes you see a
> > lot of pages from the same source (site). We are working on some changes
> to
> > shift things around and reduce this "blocking" effect, so we can present
> > mixed sources on the result pages.
> >
> > I guess I will start with the document random field and later try to
> > develop a custom plugin to make things better.
> >
> > Thanks for the pointers.
> >
> > Regards,
> > Alexandre
> >
> > On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> I really don't understand what this means:
> >> "random sorting for the records but also preserving the ranking"
> >>
> >> Either you're sorting on rank or you're not. If you mean you're
> >> trying to shift things around just a little bit, *mostly* respecting
> >> relevance then I guess you can do what you're thinking.
> >>
> >> You could create your own function query to do the boosting, see:
> >> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
> >>
> >> which would keep you from having to re-index your data to get
> >> a different "randomness".
> >>
> >> You could also consider external file fields, but I think your
> >> own function query would be cleaner. I don't think math.random
> >> is a supported function OOB
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <al...@gmail.com>
> >> wrote:
> >> > Hello all,
> >> >
> >> > Recently i've been trying to tweak some aspects of relevancy in one
> >> listing
> >> > project.
> >> > I need to give a higher score to newer documents and also boost the
> >> > document based on a boolean field that indicates the listing has
> >> pictures.
> >> > On top of that, in some situations we need a random sorting for the
> >> records
> >> > but also preserving the ranking.
> >> >
> >> > I tried to combine some techniques described in the Solr Relevancy FAQ
> >> > wiki, but when I add the random sorting, the ranking gets messy (as
> >> > expected).
> >> >
> >> > This works well:
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
> >> >
> >> > This does not work, gives a random order on what is already ranked
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
> >> >
> >> > The only way I see is to create another field on the schema
> containing a
> >> > random value and use it to boost the document the same way that was
> tone
> >> on
> >> > the boolean field.
> >> > Anyone tried something like this before and knows some way to get it
> >> > working?
> >> >
> >> > Thanks,
> >> > Alexandre
> >>
>

Re: Relevancy and random sorting

Posted by Erick Erickson <er...@gmail.com>.
Alexandre:

Have you thought about grouping? If you can analyze the incoming
documents and include a field such that "similar" documents map
to the same value, than group on that value you'll get output that
isn't dominated by repeated copies of the "similar" documents. It
depends, though, on being able to do a suitable mapping.

In your case, could the mapping just be the site from which you
got the data?

Best
Erick

On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco <al...@gmail.com> wrote:
> Erick,
>
> Probably I really written something silly. You are right on either sorting
> by field or ranking.
> I just need to change the ranking to shift things around as you said.
>
> To clarify the use case:
> We have a listing aggregator that gets product listings from a lot of
> different sites and since they are added in batches, sometimes you see a
> lot of pages from the same source (site). We are working on some changes to
> shift things around and reduce this "blocking" effect, so we can present
> mixed sources on the result pages.
>
> I guess I will start with the document random field and later try to
> develop a custom plugin to make things better.
>
> Thanks for the pointers.
>
> Regards,
> Alexandre
>
> On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> I really don't understand what this means:
>> "random sorting for the records but also preserving the ranking"
>>
>> Either you're sorting on rank or you're not. If you mean you're
>> trying to shift things around just a little bit, *mostly* respecting
>> relevance then I guess you can do what you're thinking.
>>
>> You could create your own function query to do the boosting, see:
>> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
>>
>> which would keep you from having to re-index your data to get
>> a different "randomness".
>>
>> You could also consider external file fields, but I think your
>> own function query would be cleaner. I don't think math.random
>> is a supported function OOB
>>
>> Best
>> Erick
>>
>>
>> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <al...@gmail.com>
>> wrote:
>> > Hello all,
>> >
>> > Recently i've been trying to tweak some aspects of relevancy in one
>> listing
>> > project.
>> > I need to give a higher score to newer documents and also boost the
>> > document based on a boolean field that indicates the listing has
>> pictures.
>> > On top of that, in some situations we need a random sorting for the
>> records
>> > but also preserving the ranking.
>> >
>> > I tried to combine some techniques described in the Solr Relevancy FAQ
>> > wiki, but when I add the random sorting, the ranking gets messy (as
>> > expected).
>> >
>> > This works well:
>> >
>> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
>> >
>> > This does not work, gives a random order on what is already ranked
>> >
>> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
>> >
>> > The only way I see is to create another field on the schema containing a
>> > random value and use it to boost the document the same way that was tone
>> on
>> > the boolean field.
>> > Anyone tried something like this before and knows some way to get it
>> > working?
>> >
>> > Thanks,
>> > Alexandre
>>

Re: Relevancy and random sorting

Posted by Alexandre Rocco <al...@gmail.com>.
Erick,

Probably I really written something silly. You are right on either sorting
by field or ranking.
I just need to change the ranking to shift things around as you said.

To clarify the use case:
We have a listing aggregator that gets product listings from a lot of
different sites and since they are added in batches, sometimes you see a
lot of pages from the same source (site). We are working on some changes to
shift things around and reduce this "blocking" effect, so we can present
mixed sources on the result pages.

I guess I will start with the document random field and later try to
develop a custom plugin to make things better.

Thanks for the pointers.

Regards,
Alexandre

On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson <er...@gmail.com>wrote:

> I really don't understand what this means:
> "random sorting for the records but also preserving the ranking"
>
> Either you're sorting on rank or you're not. If you mean you're
> trying to shift things around just a little bit, *mostly* respecting
> relevance then I guess you can do what you're thinking.
>
> You could create your own function query to do the boosting, see:
> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
>
> which would keep you from having to re-index your data to get
> a different "randomness".
>
> You could also consider external file fields, but I think your
> own function query would be cleaner. I don't think math.random
> is a supported function OOB
>
> Best
> Erick
>
>
> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <al...@gmail.com>
> wrote:
> > Hello all,
> >
> > Recently i've been trying to tweak some aspects of relevancy in one
> listing
> > project.
> > I need to give a higher score to newer documents and also boost the
> > document based on a boolean field that indicates the listing has
> pictures.
> > On top of that, in some situations we need a random sorting for the
> records
> > but also preserving the ranking.
> >
> > I tried to combine some techniques described in the Solr Relevancy FAQ
> > wiki, but when I add the random sorting, the ranking gets messy (as
> > expected).
> >
> > This works well:
> >
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
> >
> > This does not work, gives a random order on what is already ranked
> >
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
> >
> > The only way I see is to create another field on the schema containing a
> > random value and use it to boost the document the same way that was tone
> on
> > the boolean field.
> > Anyone tried something like this before and knows some way to get it
> > working?
> >
> > Thanks,
> > Alexandre
>

Re: Relevancy and random sorting

Posted by Erick Erickson <er...@gmail.com>.
I really don't understand what this means:
"random sorting for the records but also preserving the ranking"

Either you're sorting on rank or you're not. If you mean you're
trying to shift things around just a little bit, *mostly* respecting
relevance then I guess you can do what you're thinking.

You could create your own function query to do the boosting, see:
http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

which would keep you from having to re-index your data to get
a different "randomness".

You could also consider external file fields, but I think your
own function query would be cleaner. I don't think math.random
is a supported function OOB

Best
Erick


On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <al...@gmail.com> wrote:
> Hello all,
>
> Recently i've been trying to tweak some aspects of relevancy in one listing
> project.
> I need to give a higher score to newer documents and also boost the
> document based on a boolean field that indicates the listing has pictures.
> On top of that, in some situations we need a random sorting for the records
> but also preserving the ranking.
>
> I tried to combine some techniques described in the Solr Relevancy FAQ
> wiki, but when I add the random sorting, the ranking gets messy (as
> expected).
>
> This works well:
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
>
> This does not work, gives a random order on what is already ranked
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
>
> The only way I see is to create another field on the schema containing a
> random value and use it to boost the document the same way that was tone on
> the boolean field.
> Anyone tried something like this before and knows some way to get it
> working?
>
> Thanks,
> Alexandre