You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2016/07/29 13:59:20 UTC

[jira] [Closed] (SOLR-2499) Index-time boosts for multivalue fields are consolidated

     [ https://issues.apache.org/jira/browse/SOLR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley closed SOLR-2499.
------------------------------
    Resolution: Not A Problem

Sorry but Solr is working as-designed here -- boosts are consolidated.  Lucene, which underlies Solr, works in this way.  Index time boosts go into the "norm" for a field which is fundamentally a single value per document (per-field).

That said, I understand what you're trying to do, and you will have to get there a different way.  

One way would be to "fake" the term frequency somehow to in-effect be your boost.  You could do this perhaps with a custom analysis chain that emits the same term a bunch of times. You would then likely want to customize the Similarity for this field... although even if you don't, you should see the effect of of this, as greater TF means a greater score.  You might also want to disable norms for this field.  Personally I think this is the best route to take.

Another route is to provide the boosts to be encoded as payloads.  There is already a TokenFilter that will do that for you, so this is easy.  But then you need to customize the Similarity decode the payload and use it in the score.  I am not certain but you probably also need to generate your query using a custom SpanQuery instead of using one of Solr's existing QueryParser's.  Those latter two things -- custom Similarity, custom QParser might not necessarily be things you both need to do... maybe just one.  Not sure without digging in further.  This might be less work than the previous option I listed but I like that the previous one "frequency", an existing stat, vs payloads.

> Index-time boosts for multivalue fields are consolidated
> --------------------------------------------------------
>
>                 Key: SOLR-2499
>                 URL: https://issues.apache.org/jira/browse/SOLR-2499
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.1, 3.2, 4.0-ALPHA
>            Reporter: Neil Hooey
>              Labels: boost, multivalue, multivalued
>
> Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost.
> So, for example, given a list of photos with a multivalue field "keywords", and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this:
> {code}
> photo1: Photo of a cat by itself
> keywords: [ cat:600 feline:100 ]
> => boost total = 700
> photo2: Photo of a cat driving a truck
> keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
> => boost total = 1270
> {code}
> If you search for "cat feline", photo2 will rank higher, since the boost of "cat-like" words was consolidated with the "truck" boost anomaly. Whereas photo1, which has more downloads for "cat" and "feline", ranks lower with a lower consolidated boost, even though the total boost for the relevant keywords is higher than for photo1.
> *Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.*
> Given the current behaviour, you are forced to do one of the following:
> 1. Assemble all of the multi-values into a string, and use payloads in place of boosts.
> 2. Use dynamic fields, such as keyword_*, and boost them independently.
> Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with multivalue fields.
> There's a blog entry that describes the current behaviour:
> http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org