You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michael Alcorn <ma...@redhat.com> on 2017/08/04 14:18:52 UTC

Per Text Field Similarity Measures for Learning to Rank

Hi all,

I recently prototyped a learning to rank system in Python that produced
promising results, so I'm now looking into how to replicate that process in
our Solr setup. For my Python implementation, I was using a number of
features that were per field text comparisons, e.g.:

   1. tfidf_case_title_solution_title
   2. tfidf_case_description_solution_title
   3. ...
   4. bm25_case_title_solution_description
   5. bm25_case_description_solution_description

where each solution field had its own independent index. I was wondering if
any of you all had recommendations on how to do that type of thing in Solr.
It looks like the SolrFeature class might be the way to go, but my
colleagues who are more familiar with Solr than I am weren't sure it was
possible.

Thanks,
Michael A. Alcorn

Re: Per Text Field Similarity Measures for Learning to Rank

Posted by Michael Nilsson <mn...@gmail.com>.

Hi Michael,

Using your example, if you have 5 different fields, you could create 5
individual SolrFeatures against those fields.  The one tricky thing here is
that you want to use different similarity scoring mechanisms against your
fields.  By default, Solr uses a single Similarity class
<https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/search/similarities/Similarity.html>
against
your fields to rank all your documents.  However, you could define new
types for your special title & description fields that use different
Similarity classes
<https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements>.
This is an interesting approach and seems like it could solve your problem.

Hope that helps
-Mike

On Fri, Aug 4, 2017 at 10:18 AM, Michael Alcorn <ma...@redhat.com> wrote:

> Hi all,
>
> I recently prototyped a learning to rank system in Python that produced
> promising results, so I'm now looking into how to replicate that process in
> our Solr setup. For my Python implementation, I was using a number of
> features that were per field text comparisons, e.g.:
>
>    1. tfidf_case_title_solution_title
>    2. tfidf_case_description_solution_title
>    3. ...
>    4. bm25_case_title_solution_description
>    5. bm25_case_description_solution_description
>
> where each solution field had its own independent index. I was wondering if
> any of you all had recommendations on how to do that type of thing in Solr.
> It looks like the SolrFeature class might be the way to go, but my
> colleagues who are more familiar with Solr than I am weren't sure it was
> possible.
>
> Thanks,
> Michael A. Alcorn
>