You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Neil Hooey <nh...@gmail.com> on 2011/05/05 03:36:24 UTC

Do boosts on values in multivalued fields still get consolidated?

Kapil Chhabra indicates on his blog that if you boost a value in a
multivalued field during index time, the boosts are consolidated for
every field, and the individual values are lost.

Here's the link:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2

This post is from 2008-01-20, but it still seems to be true in Solr 3.1.

Has this behaviour been fixed in future versions of Solr, or are there
plans to fix it?

In general, when a user searches for a document, I'd like to
arbitrarily weight each keyword for that document during index time.

For example if they searched for "q=keywords:monkey", and got these documents:
keywords: [ monkey, ape, chimp, garage ]
keywords: [ monkey, cloud, food, door ]

I'd like to have boosts recorded like this, at index time, based on
keyword co-relevance:
keywords: [ monkey:50, ape:50, chimp:50, garage:0.1 ]
keywords: [ monkey:1, cloud:1, food:1, door:1 ]

Since, in the first document, the word "monkey" is clearly related to
"ape" and "chimp", but "garage" is not. Similarly in the second
document, none of the keywords are really related to each other at
all.

I see a couple of potential solutions to this problem, in the absence
of boosts for multivalued fields:
1. Turn keyword lists into strings, and use payloads: "monkey|50,
ape|50, chimp|50, garage|0.1"
2. Use dynamic fields of the form: keyword_*: keyword_monkey,
keyword_ape, ... and boost those fields.

Are those solutions feasible, or are there better solutions to this problem?

- Neil

Re: Do boosts on values in multivalued fields still get consolidated?

Posted by Neil Hooey <nh...@gmail.com>.
If I have a document with:
{ id: 1, sentences: "hello world|5.0_goodbye|2.3_this is a sentence|2.8" }

How would I get those payloads to take affect, on the tokens separated by
"_"?

How do you write a query to use those payloads?

On Wed, May 4, 2011 at 22:26, Otis Gospodnetic
<ot...@yahoo.com>wrote:

> Hi Neil,
>
> I think payloads is the way to go.  Index-time boosting is not per term.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Neil Hooey <nh...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Wed, May 4, 2011 9:36:24 PM
> > Subject: Do boosts on values in multivalued fields still get
> consolidated?
> >
> > Kapil Chhabra indicates on his blog that if you boost a value in  a
> > multivalued field during index time, the boosts are consolidated  for
> > every field, and the individual values are lost.
> >
> > Here's the  link:
> > http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2
> >
> > This  post is from 2008-01-20, but it still seems to be true in Solr 3.1.
> >
> > Has  this behaviour been fixed in future versions of Solr, or are there
> > plans to  fix it?
> >
> > In general, when a user searches for a document, I'd like  to
> > arbitrarily weight each keyword for that document during index  time.
> >
> > For example if they searched for "q=keywords:monkey", and got these
> documents:
> > keywords: [ monkey, ape, chimp, garage ]
> > keywords: [ monkey,  cloud, food, door ]
> >
> > I'd like to have boosts recorded like this, at index  time, based on
> > keyword co-relevance:
> > keywords: [ monkey:50, ape:50,  chimp:50, garage:0.1 ]
> > keywords: [ monkey:1, cloud:1, food:1, door:1  ]
> >
> > Since, in the first document, the word "monkey" is clearly related  to
> > "ape" and "chimp", but "garage" is not. Similarly in the  second
> > document, none of the keywords are really related to each other  at
> > all.
> >
> > I see a couple of potential solutions to this problem, in the  absence
> > of boosts for multivalued fields:
> > 1. Turn keyword lists into  strings, and use payloads: "monkey|50,
> > ape|50, chimp|50, garage|0.1"
> > 2.  Use dynamic fields of the form: keyword_*: keyword_monkey,
> > keyword_ape, ...  and boost those fields.
> >
> > Are those solutions feasible, or are there better  solutions to this
> problem?
> >
> > - Neil
> >
>

Re: Do boosts on values in multivalued fields still get consolidated?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Neil,

I think payloads is the way to go.  Index-time boosting is not per term.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Neil Hooey <nh...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, May 4, 2011 9:36:24 PM
> Subject: Do boosts on values in multivalued fields still get consolidated?
> 
> Kapil Chhabra indicates on his blog that if you boost a value in  a
> multivalued field during index time, the boosts are consolidated  for
> every field, and the individual values are lost.
> 
> Here's the  link:
> http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2
> 
> This  post is from 2008-01-20, but it still seems to be true in Solr 3.1.
> 
> Has  this behaviour been fixed in future versions of Solr, or are there
> plans to  fix it?
> 
> In general, when a user searches for a document, I'd like  to
> arbitrarily weight each keyword for that document during index  time.
> 
> For example if they searched for "q=keywords:monkey", and got these  
documents:
> keywords: [ monkey, ape, chimp, garage ]
> keywords: [ monkey,  cloud, food, door ]
> 
> I'd like to have boosts recorded like this, at index  time, based on
> keyword co-relevance:
> keywords: [ monkey:50, ape:50,  chimp:50, garage:0.1 ]
> keywords: [ monkey:1, cloud:1, food:1, door:1  ]
> 
> Since, in the first document, the word "monkey" is clearly related  to
> "ape" and "chimp", but "garage" is not. Similarly in the  second
> document, none of the keywords are really related to each other  at
> all.
> 
> I see a couple of potential solutions to this problem, in the  absence
> of boosts for multivalued fields:
> 1. Turn keyword lists into  strings, and use payloads: "monkey|50,
> ape|50, chimp|50, garage|0.1"
> 2.  Use dynamic fields of the form: keyword_*: keyword_monkey,
> keyword_ape, ...  and boost those fields.
> 
> Are those solutions feasible, or are there better  solutions to this problem?
> 
> - Neil
>