You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2012/07/11 01:20:06 UTC
Boosting tips
Hi there! I'm starting with Solr (had previous experience with lucene
before).
I'm using the Apache Solr 3 Enterprise Search Server book as reference, and
the musicbrainz sample data.
I deviate from the examples by creating multiple cores (artits,tracks,
albums)
my boost is:
<str name="qf">song^4 artist^4 album</str>
<str name="pf">song artist^4</str>
<str name="pf2">artist^8</str>
One thing that is being tough is coming with the right boosting. What I'm
doing is that if the user enters a free text, I search into 3 fields
(track, album, artist). So let's say:
Adele someone like you.
My problem is that I have some songs (ring tones) which have the song name
with Adele on it, and those are being ranked higher than a song by Adele.
For instance:
<song>Adele - Someone Like You ringtone</song>
<artist>Ringtone of Adele song</artist>
ranks higher than:
<song>someone like you</song>
<artist>Adele</artist>
I can see why using debug, as the first result gets more "points" on its
score for having adele, someone and "adele someone" on the song. I was
wondering if there's an way out of this type of situation.
Regards
--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.
Re: Boosting tips
Posted by Chris Hostetter <ho...@fucit.org>.
: Thank Ahmet, I did that, it kinda worked (not as well as expected) the
: document with ringtone was the 1st match, it was moved to the 2nd position,
: I was expecting it to be at very bottom. Tried other factors for boosting
: up to 10E6 but no success.
"bq" is an additive boost -- it basically just adds another clause to
the outermost BooleanQuery produced by the parser -- as the total
boost values of all clauses increases, the queryNorm increases, and the
overall effect diminishes.
using the "boost" param of edismax, or wrapping your whole query in a
{!boost} query is a much saner way to go (it's a multiplicitive boost)
-Hoss
Re: Boosting tips
Posted by Ahmet Arslan <io...@yahoo.com>.
--- On Wed, 7/11/12, Vinicius Carvalho <vi...@gmail.com> wrote:
> From: Vinicius Carvalho <vi...@gmail.com>
> Subject: Re: Boosting tips
> To: solr-user@lucene.apache.org
> Date: Wednesday, July 11, 2012, 4:24 PM
> Thank Ahmet, I did that, it kinda
> worked (not as well as expected) the
> document with ringtone was the 1st match, it was moved to
> the 2nd position,
> I was expecting it to be at very bottom. Tried other factors
> for boosting
> up to 10E6 but no success.
>
> Another issue, is that I have some bad words I really would
> like to
> penalize like: ringtone, instrumental, cover, tribute, and
> they appear on
> multiple fields. I was wondering if there's a better way of
> doing this
> instead of creating a big bq string like:
>
> (*:* -song:ringtone)^100 (*:* -album:ringtone)^100 ... and
> so on.
Didn't tried by myself but you can use dismax as bq parameter also.
&bq=(+*:* -_query_:"{!dismax qf=song album}ringtone cover")^100
wiki.apache.org/solr/LocalParams
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
Re: Boosting tips
Posted by Vinicius Carvalho <vi...@gmail.com>.
Thank Ahmet, I did that, it kinda worked (not as well as expected) the
document with ringtone was the 1st match, it was moved to the 2nd position,
I was expecting it to be at very bottom. Tried other factors for boosting
up to 10E6 but no success.
Another issue, is that I have some bad words I really would like to
penalize like: ringtone, instrumental, cover, tribute, and they appear on
multiple fields. I was wondering if there's a better way of doing this
instead of creating a big bq string like:
(*:* -song:ringtone)^100 (*:* -album:ringtone)^100 ... and so on.
Cheers
On Wed, Jul 11, 2012 at 7:15 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
> > I deviate from the examples by creating multiple cores
> > (artits,tracks,
> > albums)
> >
> > my boost is:
> > <str name="qf">song^4 artist^4 album</str>
> > <str name="pf">song
> > artist^4</str>
> > <str
> > name="pf2">artist^8</str>
> >
> >
> > One thing that is being tough is coming with the right
> > boosting. What I'm
> > doing is that if the user enters a free text, I search into
> > 3 fields
> > (track, album, artist). So let's say:
> >
> > Adele someone like you.
> >
> > My problem is that I have some songs (ring tones) which have
> > the song name
> > with Adele on it, and those are being ranked higher than a
> > song by Adele.
> > For instance:
> >
> > <song>Adele - Someone Like You ringtone</song>
> > <artist>Ringtone of Adele song</artist>
> >
> > ranks higher than:
> >
> > <song>someone like you</song>
> > <artist>Adele</artist>
> >
> > I can see why using debug, as the first result gets more
> > "points" on its
> > score for having adele, someone and "adele someone" on the
> > song. I was
> > wondering if there's an way out of this type of situation.
>
> I think this is expected because first result contains whole query in the
> same order. pf parameter kicks in. I would punish ringtone type documents
> using bq parameter. Something like that
> &bq=(*:* -song:ringtone)^100
>
> If you can create additional field like songType={ringtone, ...} at index
> time, you can faceting to filter results.
>
--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.
Re: Boosting tips
Posted by Ahmet Arslan <io...@yahoo.com>.
> I deviate from the examples by creating multiple cores
> (artits,tracks,
> albums)
>
> my boost is:
> <str name="qf">song^4 artist^4 album</str>
> <str name="pf">song
> artist^4</str>
> <str
> name="pf2">artist^8</str>
>
>
> One thing that is being tough is coming with the right
> boosting. What I'm
> doing is that if the user enters a free text, I search into
> 3 fields
> (track, album, artist). So let's say:
>
> Adele someone like you.
>
> My problem is that I have some songs (ring tones) which have
> the song name
> with Adele on it, and those are being ranked higher than a
> song by Adele.
> For instance:
>
> <song>Adele - Someone Like You ringtone</song>
> <artist>Ringtone of Adele song</artist>
>
> ranks higher than:
>
> <song>someone like you</song>
> <artist>Adele</artist>
>
> I can see why using debug, as the first result gets more
> "points" on its
> score for having adele, someone and "adele someone" on the
> song. I was
> wondering if there's an way out of this type of situation.
I think this is expected because first result contains whole query in the same order. pf parameter kicks in. I would punish ringtone type documents using bq parameter. Something like that
&bq=(*:* -song:ringtone)^100
If you can create additional field like songType={ringtone, ...} at index time, you can faceting to filter results.