You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avlesh Singh <av...@gmail.com> on 2009/09/20 12:58:19 UTC

Questions on RandomSortField

I am using Solr 1.3
I have a "solr.RandomSortField" type dynamic field which I use to randomize
my results.

I am in a tricky situation. I need to randomize only "certain results" in my
Hits.
To elaborate, I have a integer field called "category_id". When performing a
query, I need to get results from all categories and place the ones from
SOME_CAT_ID at the top. I achieved this by populating a separate dynamic
field while indexing data. i.e When a doc is added to the index a field
called "dynamic_cat_id_SOME_CAT_ID" is populated with its category id. While
querying, I know the value of SOME_CAT_ID, so adding a
"sort=dynamic_cat_id_SOME_CAT_ID asc, score desc" to my query, works
absolutely fine.

So far so good. I am now supposed to randomize the results for
category_id=SOME_CAT_ID, i.e results at the top. My understading is that
adding "sort=dynamic_cat_id_SOME_CAT_ID asc, *my_dynamic_random_field_SOME_SEED
asc*, score desc" to the query would randomize all the results. This is not
desired. I only want to randomize the one's at the top
(category_id=SOME_CAT_ID), rest should be ordered based on relevance score.

Two simple questions:

   1. Is there a way to achieve this without writing any custom code?
   2. If the answer to #1 is "no", the Where should I start? I glanced the
   RandomSortField class but could not figure out how to proceed. Do I need to
   create a custom FieldType? Can I extend the RandomSortField and override the
   sorting behaviour?

Any help would be appreciated.

Cheers
Avlesh

Re: Questions on RandomSortField

Posted by Avlesh Singh <av...@gmail.com>.
Thanks Hoss!
The approach that I explained in my subsequent email works like a charm.

Cheers
Avlesh

On Wed, Sep 30, 2009 at 3:45 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : The question was either non-trivial or heavily uninteresting! No replies
> yet
>
> it's pretty non-trivial, and pretty interesting, but i'm also pretty
> behind on my solr-user email.
>
> I don't think there's anyway to do what you wanted without a custom
> plugin, so your efforts weren't in vain ... if we add the abiliity to sort
> by a ValueSource (aka function ... there's a Jira issue for this
> somewhere) then you could also do witha combination of functions so that
> anything in your category gets flattened to an extremely high constant,
> and everything else has a real score -- then a secondary sort on a random
> field would effectively only randomize the things in your category ... but
> we're not there yet.
>
> : Hoss, I have a small question (RandomSortField bears your signature) -
> Any
> : reason as to why RandomSortField#hash() and RandomSortField#getSeed()
> : methods are private? Having them public would have saved myself from
> : "owning" a copy in my class as well.
>
> just a general principle of API future-proofing: keep internals private
> unless you explicitly think through how subclasses will use them.
>
> I haven't thought it through all the way, but do you really need to copy
> everything?  couldn't you get the SortField/Comparator from super and
> only delegate to it if the categories both match your specific categoryId?
>
>
>
> -Hoss
>
>

Re: Questions on RandomSortField

Posted by Chris Hostetter <ho...@fucit.org>.
: The question was either non-trivial or heavily uninteresting! No replies yet

it's pretty non-trivial, and pretty interesting, but i'm also pretty 
behind on my solr-user email.

I don't think there's anyway to do what you wanted without a custom 
plugin, so your efforts weren't in vain ... if we add the abiliity to sort 
by a ValueSource (aka function ... there's a Jira issue for this 
somewhere) then you could also do witha combination of functions so that 
anything in your category gets flattened to an extremely high constant, 
and everything else has a real score -- then a secondary sort on a random 
field would effectively only randomize the things in your category ... but 
we're not there yet.

: Hoss, I have a small question (RandomSortField bears your signature) - Any
: reason as to why RandomSortField#hash() and RandomSortField#getSeed()
: methods are private? Having them public would have saved myself from
: "owning" a copy in my class as well.

just a general principle of API future-proofing: keep internals private 
unless you explicitly think through how subclasses will use them.

I haven't thought it through all the way, but do you really need to copy 
everything?  couldn't you get the SortField/Comparator from super and 
only delegate to it if the categories both match your specific categoryId? 



-Hoss


Re: Questions on RandomSortField

Posted by Avlesh Singh <av...@gmail.com>.
The question was either non-trivial or heavily uninteresting! No replies yet
:)
Thankfully, I figured out a solution for the problem at hand. For people who
might be looking for a solution, here it goes -

   1. Extended the RandomSortField to create your own YourCustomRandomField.

   2. Override the RandomSortField #getSortField method to return
   YourSortField.
   3. Return YourSortComparatorSource from YourSortField#getFactory().
   4. Most of the rules related to the problem statement would be handled in
   the YourSortComparatorSource#newComparator().
   5. In your schema, create a dynamic field of YourFieldType. Pass in the
   "id" (Look at the problem statement in the trailing post) as a part of the
   dynamic field name in your sort query.
   6. Inside YourSortComparatorSource#newComparator(), get the above
   mentioned "id" from fieldName parameter and then fetch the values indexed in
   this field using Lucene's FieldCache.
   7. In your ScoreDocComparator#compare(), first check for the values in
   the "id" field and return -1,1,0 or "hash(i.doc + seed) - hash(j.doc +
   seed)" based on the values in this field. The idea is to only randomize
   results for a particular "id" value.

Hoss, I have a small question (RandomSortField bears your signature) - Any
reason as to why RandomSortField#hash() and RandomSortField#getSeed()
methods are private? Having them public would have saved myself from
"owning" a copy in my class as well.

My solution applies to Solr 1.3. It might not hold true for higher versions
as underlying Lucene API's might have changed.

Cheers
Avlesh

On Sun, Sep 20, 2009 at 4:28 PM, Avlesh Singh <av...@gmail.com> wrote:

> I am using Solr 1.3
> I have a "solr.RandomSortField" type dynamic field which I use to randomize
> my results.
>
> I am in a tricky situation. I need to randomize only "certain results" in
> my Hits.
> To elaborate, I have a integer field called "category_id". When performing
> a query, I need to get results from all categories and place the ones from
> SOME_CAT_ID at the top. I achieved this by populating a separate dynamic
> field while indexing data. i.e When a doc is added to the index a field
> called "dynamic_cat_id_SOME_CAT_ID" is populated with its category id. While
> querying, I know the value of SOME_CAT_ID, so adding a
> "sort=dynamic_cat_id_SOME_CAT_ID asc, score desc" to my query, works
> absolutely fine.
>
> So far so good. I am now supposed to randomize the results for
> category_id=SOME_CAT_ID, i.e results at the top. My understading is that
> adding "sort=dynamic_cat_id_SOME_CAT_ID asc, *my_dynamic_random_field_SOME_SEED
> asc*, score desc" to the query would randomize all the results. This is
> not desired. I only want to randomize the one's at the top
> (category_id=SOME_CAT_ID), rest should be ordered based on relevance score.
>
> Two simple questions:
>
>    1. Is there a way to achieve this without writing any custom code?
>    2. If the answer to #1 is "no", the Where should I start? I glanced the
>    RandomSortField class but could not figure out how to proceed. Do I need to
>    create a custom FieldType? Can I extend the RandomSortField and override the
>    sorting behaviour?
>
> Any help would be appreciated.
>
> Cheers
> Avlesh
>