You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2012/07/11 01:20:06 UTC

Boosting tips

Hi there! I'm starting with Solr (had previous experience with lucene
before).

I'm using the Apache Solr 3 Enterprise Search Server book as reference, and
the musicbrainz sample data.

I deviate from the examples by creating multiple cores (artits,tracks,
albums)

my boost is:
<str name="qf">song^4 artist^4 album</str>
      <str name="pf">song artist^4</str>
      <str name="pf2">artist^8</str>


One thing that is being tough is coming with the right boosting. What I'm
doing is that if the user enters a free text, I search into 3 fields
(track, album, artist). So let's say:

Adele someone like you.

My problem is that I have some songs (ring tones) which have the song name
with Adele on it, and those are being ranked higher than a song by Adele.
For instance:

<song>Adele - Someone Like You ringtone</song>
<artist>Ringtone of Adele song</artist>

ranks higher than:

<song>someone like you</song>
<artist>Adele</artist>

I can see why using debug, as the first result gets more "points" on its
score for having adele, someone and "adele someone" on the song. I was
wondering if there's an way out of this type of situation.

Regards

-- 
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

Re: Boosting tips

Posted by Chris Hostetter <ho...@fucit.org>.
: Thank Ahmet, I did that, it kinda worked (not as well as expected) the
: document with ringtone was the 1st match, it was moved to the 2nd position,
: I was expecting it to be at very bottom. Tried other factors for boosting
: up to 10E6 but no success.

"bq" is an additive boost -- it basically just adds another clause to 
the outermost BooleanQuery produced by the parser -- as the total 
boost values of all clauses increases, the queryNorm increases, and the 
overall effect diminishes.

using the "boost" param of edismax, or wrapping your whole query in a 
{!boost} query is a much saner way to go (it's a multiplicitive boost)


-Hoss

Re: Boosting tips

Posted by Ahmet Arslan <io...@yahoo.com>.

--- On Wed, 7/11/12, Vinicius Carvalho <vi...@gmail.com> wrote:

> From: Vinicius Carvalho <vi...@gmail.com>
> Subject: Re: Boosting tips
> To: solr-user@lucene.apache.org
> Date: Wednesday, July 11, 2012, 4:24 PM
> Thank Ahmet, I did that, it kinda
> worked (not as well as expected) the
> document with ringtone was the 1st match, it was moved to
> the 2nd position,
> I was expecting it to be at very bottom. Tried other factors
> for boosting
> up to 10E6 but no success.
> 
> Another issue, is that I have some bad words I really would
> like to
> penalize like: ringtone, instrumental, cover, tribute, and
> they appear on
> multiple fields. I was wondering if there's a better way of
> doing this
> instead of creating a big bq string like:
> 
> (*:* -song:ringtone)^100 (*:* -album:ringtone)^100 ... and
> so on.

Didn't tried by myself but you can use dismax as bq parameter also.

&bq=(+*:* -_query_:"{!dismax qf=song album}ringtone cover")^100

wiki.apache.org/solr/LocalParams
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

Re: Boosting tips

Posted by Vinicius Carvalho <vi...@gmail.com>.
Thank Ahmet, I did that, it kinda worked (not as well as expected) the
document with ringtone was the 1st match, it was moved to the 2nd position,
I was expecting it to be at very bottom. Tried other factors for boosting
up to 10E6 but no success.

Another issue, is that I have some bad words I really would like to
penalize like: ringtone, instrumental, cover, tribute, and they appear on
multiple fields. I was wondering if there's a better way of doing this
instead of creating a big bq string like:

(*:* -song:ringtone)^100 (*:* -album:ringtone)^100 ... and so on.

Cheers

On Wed, Jul 11, 2012 at 7:15 AM, Ahmet Arslan <io...@yahoo.com> wrote:

>
> > I deviate from the examples by creating multiple cores
> > (artits,tracks,
> > albums)
> >
> > my boost is:
> > <str name="qf">song^4 artist^4 album</str>
> >       <str name="pf">song
> > artist^4</str>
> >       <str
> > name="pf2">artist^8</str>
> >
> >
> > One thing that is being tough is coming with the right
> > boosting. What I'm
> > doing is that if the user enters a free text, I search into
> > 3 fields
> > (track, album, artist). So let's say:
> >
> > Adele someone like you.
> >
> > My problem is that I have some songs (ring tones) which have
> > the song name
> > with Adele on it, and those are being ranked higher than a
> > song by Adele.
> > For instance:
> >
> > <song>Adele - Someone Like You ringtone</song>
> > <artist>Ringtone of Adele song</artist>
> >
> > ranks higher than:
> >
> > <song>someone like you</song>
> > <artist>Adele</artist>
> >
> > I can see why using debug, as the first result gets more
> > "points" on its
> > score for having adele, someone and "adele someone" on the
> > song. I was
> > wondering if there's an way out of this type of situation.
>
> I think this is expected because first result contains whole query in the
> same order. pf parameter kicks in. I would punish ringtone type documents
> using bq parameter. Something like that
> &bq=(*:* -song:ringtone)^100
>
> If you can create additional field like songType={ringtone, ...} at index
> time, you can faceting to filter results.
>



-- 
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

Re: Boosting tips

Posted by Ahmet Arslan <io...@yahoo.com>.
> I deviate from the examples by creating multiple cores
> (artits,tracks,
> albums)
> 
> my boost is:
> <str name="qf">song^4 artist^4 album</str>
>       <str name="pf">song
> artist^4</str>
>       <str
> name="pf2">artist^8</str>
> 
> 
> One thing that is being tough is coming with the right
> boosting. What I'm
> doing is that if the user enters a free text, I search into
> 3 fields
> (track, album, artist). So let's say:
> 
> Adele someone like you.
> 
> My problem is that I have some songs (ring tones) which have
> the song name
> with Adele on it, and those are being ranked higher than a
> song by Adele.
> For instance:
> 
> <song>Adele - Someone Like You ringtone</song>
> <artist>Ringtone of Adele song</artist>
> 
> ranks higher than:
> 
> <song>someone like you</song>
> <artist>Adele</artist>
> 
> I can see why using debug, as the first result gets more
> "points" on its
> score for having adele, someone and "adele someone" on the
> song. I was
> wondering if there's an way out of this type of situation.

I think this is expected because first result contains whole query in the same order. pf parameter kicks in. I would punish ringtone type documents using bq parameter. Something like that 
&bq=(*:* -song:ringtone)^100 

If you can create additional field like songType={ringtone, ...} at index time, you can faceting to filter results.