You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Brian (JIRA)" <ji...@apache.org> on 2014/08/04 20:12:15 UTC

[jira] [Commented] (SOLR-2304) MoreLikeThis: Apply field level boosts before query terms are selected

    [ https://issues.apache.org/jira/browse/SOLR-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085012#comment-14085012 ] 

Brian commented on SOLR-2304:
-----------------------------

I'm not sure this should be changed - I think the current behavior is expected.  That is, "qf" with its dismax origins implies only query-time boosting - making it change which terms are selected I think would be more surprising and unexpected than it's current behavior.  I think that instead another parameter should be added giving the option of applying the field boosts prior to building the query as well.

I.e., I think the following use case could be common.  We want to get interesting terms for building the MoreLikeThis query from across the whole document (across multiple fields) - we don't want terms showing up in specific fields to be weighted higher than others.  We then use these interesting terms to build a query.  However, at query time we do want to weight different fields more highly - which is what the qf parameter is used for in dismax - but using the same set of terms.  (I realize in this case this also would require changing how MoreLikeThis builds a query since it does not currently support cross-field queries but I wouldn't want this change to prevent that possibility).

 I think it would be better to allow keeping the old behavior by either:
-Adding a single boolean parameter specifying whether or not to apply "qf" field boost prior to selecting terms as well
-Creating a new parameter specifically for interesting term field boost
--This arguably is easier to understand, plus provides the most flexibility, because then we could have different boosts for generating the terms and then using those terms in the query.  However it introduces greater complexity.


> MoreLikeThis: Apply field level boosts before query terms are selected
> ----------------------------------------------------------------------
>
>                 Key: SOLR-2304
>                 URL: https://issues.apache.org/jira/browse/SOLR-2304
>             Project: Solr
>          Issue Type: Improvement
>          Components: MoreLikeThis
>    Affects Versions: 1.4.2
>            Reporter: Mike Mattozzi
>            Priority: Minor
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2304.patch
>
>
> MoreLikeThis provides the ability to set field level boosts to weight the importance of fields in selecting similar documents. Currently, in trunk, these field level boosts are applied after the query terms have been selected from the priority queue of interesting terms in MoreLIkeThis. This can give unexpected results when used in combination with mlt.maxqt to limit the number of query terms. For example, if you use fields fieldA and fieldB and boost them "fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms in fieldA have relatively higher tf-idf scores than fieldB, only 20 fieldA terms will be selected as the basis for the MoreLikeThis query... even if after boosting, there are terms in fieldB with a higher overall score. 
> I encountered this while using document descriptive text and document tags (comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the tags higher, however the less common document text terms were always selected as the query terms while the more common tag terms were eliminated by the maxqt parameter before their scores were boosted. 
> I believe the code was originally written as it was so that the bulk of the work could be done in the MoreLikeThisHandler without modifying the MoreLikeThis class in the lucene project. Now that the projects are merged, I think this modification makes sense. I will be attaching a simple patch to trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org