You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2017/05/11 18:16:38 UTC

Learn To Rank Questions

Hi,

Just getting up to speed on LTR and have a few questions (most of which are
speculative at this point and exploratory, as I have a couple of talks
coming up on this and other relevance features):

1. Has anyone looked at what's involved with supporting SparkML or other
models (e.g. PMML)?

2. Has anyone looked at features for text?  i.e. returning TF-IDF vectors
or similar.  FieldValueFeature is kind of like this, but I might want
weights for the terms, not just the actual values.  I could get this via
term vectors, but then it doesn't fit the framework.

3. How about payloads and/or things like boost values for documents as
features?

4. Are there example docs of training and using the
MultipleAdditiveTreesModel?  I see unit tests for them, but looking for
something similar to the python script in the example dir.

On 2 and 3, I imagine some of this can be done creatively via the
SolrFeature and function queries.

Thanks,
Grant

Re: Learn To Rank Questions

Posted by Joël Trigalo <jo...@trovit.com>.

1.
So I think it is a spark problem first (https://issues.apache.org/jir
a/browse/SPARK-10413). What we can do is to create our own model (cf
https://github.com/apache/lucene-solr/tree/master/solr/contr
ib/ltr/src/java/org/apache/solr/ltr/model) that applies the prediction, it
should be easy to do for a simple model, like logistic regression.
For PMML, the idea would also be to implement a Model that reuse a java lib
able to apply PMML.

2.
This function query gives you TF IDF of textField vs userQuery for the doc

 {!edismax qf='textField' mm=100% v=${userQuery} tie=0.1}

Also it seems to me LTR only allows float features which is a limitation.

3.
If the boost value is an index time boost I don't think it is possible. You
could put the feature you want in a field at index time and then use
FieldValueFeature
to extract it.

On Thu, May 11, 2017 at 8:16 PM, Grant Ingersoll <gs...@apache.org>
wrote:

> Hi,
>
> Just getting up to speed on LTR and have a few questions (most of which are
> speculative at this point and exploratory, as I have a couple of talks
> coming up on this and other relevance features):
>
> 1. Has anyone looked at what's involved with supporting SparkML or other
> models (e.g. PMML)?
>
> 2. Has anyone looked at features for text?  i.e. returning TF-IDF vectors
> or similar.  FieldValueFeature is kind of like this, but I might want
> weights for the terms, not just the actual values.  I could get this via
> term vectors, but then it doesn't fit the framework.
>
> 3. How about payloads and/or things like boost values for documents as
> features?
>
> 4. Are there example docs of training and using the
> MultipleAdditiveTreesModel?  I see unit tests for them, but looking for
> something similar to the python script in the example dir.
>
> On 2 and 3, I imagine some of this can be done creatively via the
> SolrFeature and function queries.
>
> Thanks,
> Grant
>