You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Diego Ceccarelli (BLOOMBERG/ LONDON)" <dc...@bloomberg.net> on 2017/06/02 18:36:33 UTC

Re: Learn To Rank Questions

Hi, 
Sorry for the delay, here are my replies: 

1. I'm not yet a spark user (but I'm working on that :)) 

2. I'm not sure I understand how you would use a feature that is not a float into a model,
in my experience all the learning to rank methods always train and predict from a list of
floats. Could you provide more details on how you would use TF-IDF vectors?

3. I've never played with payloads, but if you can access them from the IndexReader
then you can write a Java class extending Feature and return them. If you want to boost certain documents at query time you can use the efi parameter that allows to inject parameters at query time

4. Thanks, that's a good point, we must provide an example. I'll work on that. 

Best,
Diego


From: solr-user@lucene.apache.org At: 05/15/17 13:30:06
To: solr-user@lucene.apache.org
Subject: Re: Learn To Rank Questions

1.
So I think it is a spark problem first (https://issues.apache.org/jir
a/browse/SPARK-10413). What we can do is to create our own model (cf
https://github.com/apache/lucene-solr/tree/master/solr/contr
ib/ltr/src/java/org/apache/solr/ltr/model) that applies the prediction, it
should be easy to do for a simple model, like logistic regression.
For PMML, the idea would also be to implement a Model that reuse a java lib
able to apply PMML.

2.
This function query gives you TF IDF of textField vs userQuery for the doc

 {!edismax qf='textField' mm=100% v=${userQuery} tie=0.1}

Also it seems to me LTR only allows float features which is a limitation.


3.
If the boost value is an index time boost I don't think it is possible. You
could put the feature you want in a field at index time and then use
FieldValueFeature
to extract it.

On Thu, May 11, 2017 at 8:16 PM, Grant Ingersoll <gs...@apache.org>
wrote:

> Hi,
>
> Just getting up to speed on LTR and have a few questions (most of which are
> speculative at this point and exploratory, as I have a couple of talks
> coming up on this and other relevance features):
>
> 1. Has anyone looked at what's involved with supporting SparkML or other
> models (e.g. PMML)?
>
> 2. Has anyone looked at features for text?  i.e. returning TF-IDF vectors
> or similar.  FieldValueFeature is kind of like this, but I might want
> weights for the terms, not just the actual values.  I could get this via
> term vectors, but then it doesn't fit the framework.
>
> 3. How about payloads and/or things like boost values for documents as
> features?
>
> 4. Are there example docs of training and using the
> MultipleAdditiveTreesModel?  I see unit tests for them, but looking for
> something similar to the python script in the example dir.
>
> On 2 and 3, I imagine some of this can be done creatively via the
> SolrFeature and function queries.
>
> Thanks,
> Grant
>