You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ahmet Anil Pala (JIRA)" <ji...@apache.org> on 2016/04/27 15:19:13 UTC
[jira] [Issue Comment Deleted] (SOLR-8542) Integrate Learning to Rank into Solr

     [ https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmet Anil Pala updated SOLR-8542:
----------------------------------
    Comment: was deleted

(was: Hi, thanks for the answer.


Well, not in particular. I have experimented with NNs and SVM with RBF kernels and they are promising especially in the cases where the target attribute is result of a complex interaction of inputs which is likely to be the case if you are after modelling some customer behavior. What is different in the SVM with polynomial kernels is that although training can be done in a pairwise fashion (constraint training), in the 'live phase' the distance of an example form the separating hyperplane can be used to score the documents. This is possible because we can 'distribute' the W over the polynomial kernel as you did above:


W(K(V(D_1), V(D_2)) > 0
W(V(D_1) - V(D_2)) > 0 where K(A,B) = A - B
W*V(D_1) - W*V(D_2) > 0


However, some kernels do not allow this. For example, RBF kernel. RBF(D_1, D_2) = e^||D1-D2||. This is also an example of 'kernel trick' where the non-linear feature mapping kernel does is implicit. In this case, we cannot use SVM as a scorer as our learned W is supposed to be multiplied by the kernel value of the document pair in the 'live phase' for the predictions. Therefore, In his paper Joachims didn't use  SVM with kernels. He explains it as follows:

"If Kernels are not used, this property makes the application of the learned retrieval function very efficient. Fast algorithms exists for computing rankings based on linear functions by means of inverted indices"


As you said lambdaMart is a promising model. I like it especially because it is a hierarchical model. so the LTR can treat different search cases differently (e.g different hours of day, different ranking formula). However, I'd love to be able to at least use my pairwise NN model (used fann library) in Solr using LTR. But then, 'reordering' of the products will be based on a classifier and some near-optimal algorithm for using a classifier for reordering must be used. There do exist solutions for them although I don't know the performance implications of this. The following paper covers some of them : http://arxiv.org/pdf/1105.5464.pdf


)

> Integrate Learning to Rank into Solr
> ------------------------------------
>
>                 Key: SOLR-8542
>                 URL: https://issues.apache.org/jira/browse/SOLR-8542
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joshua Pantony
>            Assignee: Christine Poerschke
>            Priority: Minor
>         Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into Solr. Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 ( http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached documentation as a github MD file, but are happy to convert to a desired format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>    
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>     
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org