You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/03/16 05:59:39 UTC

[jira] [Updated] (SOLR-2304) MoreLikeThis: Apply field level boosts before query terms are selected

     [ https://issues.apache.org/jira/browse/SOLR-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-2304:
-------------------------------

    Fix Version/s:     (was: 4.7)
                   4.8

> MoreLikeThis: Apply field level boosts before query terms are selected
> ----------------------------------------------------------------------
>
>                 Key: SOLR-2304
>                 URL: https://issues.apache.org/jira/browse/SOLR-2304
>             Project: Solr
>          Issue Type: Improvement
>          Components: MoreLikeThis
>    Affects Versions: 1.4.2
>            Reporter: Mike Mattozzi
>            Priority: Minor
>             Fix For: 4.8
>
>         Attachments: SOLR-2304.patch
>
>
> MoreLikeThis provides the ability to set field level boosts to weight the importance of fields in selecting similar documents. Currently, in trunk, these field level boosts are applied after the query terms have been selected from the priority queue of interesting terms in MoreLIkeThis. This can give unexpected results when used in combination with mlt.maxqt to limit the number of query terms. For example, if you use fields fieldA and fieldB and boost them "fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms in fieldA have relatively higher tf-idf scores than fieldB, only 20 fieldA terms will be selected as the basis for the MoreLikeThis query... even if after boosting, there are terms in fieldB with a higher overall score. 
> I encountered this while using document descriptive text and document tags (comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the tags higher, however the less common document text terms were always selected as the query terms while the more common tag terms were eliminated by the maxqt parameter before their scores were boosted. 
> I believe the code was originally written as it was so that the bulk of the work could be done in the MoreLikeThisHandler without modifying the MoreLikeThis class in the lucene project. Now that the projects are merged, I think this modification makes sense. I will be attaching a simple patch to trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org