You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by mnilsson23 <gi...@git.apache.org> on 2016/05/27 14:35:39 UTC

[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

GitHub user mnilsson23 opened a pull request:

    https://github.com/apache/lucene-solr/pull/40

    SOLR-8542: Integrate Learning to Rank into Solr

    Solr Learning to Rank (LTR) provides a way for you to extract features
    directly inside Solr for use in training a machine learned model. You
    can then deploy that model to Solr and use it to rerank your top X
    search results. This concept was previously presented by the authors at
    Lucene/Solr Revolution 2015.
    
    See the [README](https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr) for more information on how to get started.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-release

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #40
    
----
commit 073de9b2719abe91e106119b23b977e521e8b32f
Author: Diego Ceccarelli <dc...@bloomberg.net>
Date:   2016-01-13T22:29:17Z

    SOLR-8542: Integrate Learning to Rank into Solr
    
    Solr Learning to Rank (LTR) provides a way for you to extract features
    directly inside Solr for use in training a machine learned model. You
    can then deploy that model to Solr and use it to rerank your top X
    search results. This concept was previously presented by the authors at
    Lucene/Solr Revolution 2015

commit b2bbe8c13122280ee5a76149bfb55fd1b7324279
Author: Michael Nilsson <mn...@bloomberg.net>
Date:   2016-05-25T22:13:05Z

    Learning to Rank plugin updates
    
    - Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process.  We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear.
    - Formatted the code based on the lucene eclipse style
    - Updated the hashCode and equals functions of the ModelQuery as [~Alessandro.Benedetti] pointed out
    - Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm
    - Cleaned up the LTRScoringAlgorithm to no longer have a type parameter
    - Added IntelliJ support.  Thank you [~Alessandro.Benedetti] for adding it
    - Renamed mstore and fstore endpoints to feature-store and model-store as per [~Upayavira]'s suggestion
    - Added support for default efi parameters using the same Solr  standard in solrconfig.  When defining a feature in the config, put $\{isFromManchester:0\} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion [~Alessandro.Benedetti]
    - Removed the fv=true param requirement for extracting features.
    - You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query.  Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly.  You can also provide local efi params if needed when extracting without an rq.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by mkhludnev <gi...@git.apache.org>.
Github user mkhludnev commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-37287237
  
    Terry, 
    So far, cleanup in Boolean* classes seems good, but I have to mention that the bunch of my custom queries need to distinguish scorers by obtaining Query through Weight. I feel like Scorer.getWeight is a useful thing in general.
    Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki closed the pull request at:

    https://github.com/apache/lucene-solr/pull/40


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by mkhludnev <gi...@git.apache.org>.
Github user mkhludnev commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-37409000
  
    Terry,
    Yep, passing Weight everywhere might be overwhelming. My case for scorer.weight.query usage, is own drill-sideway facet collector. I run standard BooleanQuery like +Brand:DG Color:Red Size:XL with minShouldMatch=1. When collector.collect(int doc) is called it checks child scores positions to understand whether it Red or XL hit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38919593
  
    Mikhail,
    
    I understand your sentiment about having vision. When I started this pull request I thought I had a nice clear direction in mind; to decouple Scorer from Weight and thus allow Scorer classes to be more reusable. After taking some time to think about your concerns I'm coming round to realize that there is nothing to fix here. I nearly pushed for changing the weight reference to a scorer to help the decoupling but realize that it'd still be helpful if someone wants to stick a breakpoint in the code to diagnose a hairy issue.
    
    Can I point this pull request back to it's first commit (db57c80) and see if the coord cleanup is something that'd be helpful?
    
    --Terry


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by mkhludnev <gi...@git.apache.org>.
Github user mkhludnev commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38966314
  
    Terry,
    I think Robert seconds the coord cleanup, let's wait till he reviews it. If you want to know my opinion, I already supported it above. 
    Good shoot! Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38562659
  
    Mikhail,
    
    Fair enough.
    
    I'd like to suggest two different options and am happy to supply patches.
    
    1) Remove the TODO comment next to Scorer.weight. 
    
    https://github.com/apache/lucene-solr/blob/branch_4x/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L45
    
    ```java
    public abstract class Scorer extends DocsEnum {
      /** the Scorer's parent Weight. in some cases this may be null */
      // TODO can we clean this up?
      protected final Weight weight;
    ```
    
    2) More controversial, replaced this weight with a query reference. This would help decouple Scorers from Weights and should allow for more Scorer reuse.
    
    What do you think?
    
    --Terry



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-48182618
  
    I'm closing this pull request as BooleanWeight.scorer() has changed quite a bit since this patch was generated.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-37405412
  
    Robert: Great, is there anything I can do to help better prep the commit (db57c80) for that?
    
    mkhludenv (your first name is not on your Github profile page): Interesting. There is only one Scorer and 3 tests in Lucene/Solr that took advantage of Scorer.getWeight() and bunches of code that had to pass through a weight or null to Scorer's constructor. I assume the custom queries that you mention can not just use the same approach I used when tweaking ToParentBlockJoinQuery (00740d6)? This works because the Scorer is trying to get to the Query from the Weight that created it. Do your custom queries try to access Query objects from Scorers that they didn't create? If so would you mind sharing a little more information about how you wire that up? I'd love to understand your use case better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by mkhludnev <gi...@git.apache.org>.
Github user mkhludnev commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38673813
  
    Terry,
    IMHO 1). 
    But I rather prefer that the person who establish this API judges this evolution, because he has plan or vision in mind.
    Coming back to your original concern, AFAIK you are bored by passing Weight reference, I found it's not so annoying when Scorers are inner Weight's classes. Though, it doesn't allow to unleash scorers and make them standalone, but i don't see an issue here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by rmuir <gi...@git.apache.org>.
Github user rmuir commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-37408333
  
    I can take care, i just want to do a proper review first and I ran out of time yesterday.
    
    As far as Scorer.getWeight, the tests may not expose this so much, but the idea is that you can connect Scorers to e.g. the Query objects that own them. This can be useful in custom Collectors, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by m-khl <gi...@git.apache.org>.
Github user m-khl commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38325164
  
    Terry,
    I agree, wrappers are boilerplate. We started from creating Scorers' Mixings framework, which bored us a lot. Now we are happy with accessing query which creates scorers. One more note, something which seems more or less efficient, might not be measurable at all. 
    Calling score() works fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by rmuir <gi...@git.apache.org>.
Github user rmuir commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-37290855
  
    No matter what happens here, we should at least do the coord cleanup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by shebiki <gi...@git.apache.org>.
Github user shebiki commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38165595
  
    Mikhail,
    
    I have a similar use case and opted for creating the `BooleanScorer2` directly instead of trying to associate each child `Scorer` with the drilldown id. I chose not to use the [QueryWrapper](https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/facet/src/java/org/apache/lucene/facet/search/DrillSideways.java#L352) pattern from `DrillSideways` in 4.6.0 because I felt it would prevent future optimizations and it was no longer in use in 4.7. I didn't consider the idea of just comparing `scorer.getWeight().getQuery()` but it's essentially the same work flow.
    
    The reason that I felt that prevented further optimization is that it prevents a `Weight` instance from returning an already created child `Scorer`. For example:
    
    * A `BooleanQuery` consisting of just `SHOULD` clauses with `disableCoord` set to `true`. If a segment only has one non-null scorer then `BooleanWeight.scorer()` should be able to return just that child scorer instead of having to wrap it with another.
    * Introduction of a extra scoring metadata (imagine decorating each score with an additional `boolean`). In this case a composing query (variant of `BooleanQuery`, ``DisjunctionMaxQuery`, etc) would want to aggregated this extra metadata at scoring time. If the metadata has a decent default value then only some of the child `Scorer`s will be able to provide it. If non of the child `Scorer`s provide this metadata then it's calculation can probably be short circuited and it can just return a `BooleanScorer`, `ConjunctionScorer`, or `DisjunctionScorer` as needed. This would be more efficient than making it wrap unconditionally.
    
    Quick question about your particular drillsideways query. Do you call `score()`, `freq()`, or something else to ensure the `SHOULD` `Scorer`s are correctly positioned? Do you optimize for when `BooleanQuery` returns a `DisjunctionScorer` and the child `Scorer`s are already positioned?
    
    --Terry



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: Removal of Scorer.weight

Posted by mkhludnev <gi...@git.apache.org>.
Github user mkhludnev commented on the pull request:

    https://github.com/apache/lucene-solr/pull/40#issuecomment-38898977
  
    off-top. colleague of mine just send me the pic
    ![47797672](https://cloud.githubusercontent.com/assets/807522/2547791/c2d5b862-b656-11e3-8fab-c2b3237aa897.jpg)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org