You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amit Nithian <an...@gmail.com> on 2014/01/27 19:36:03 UTC

Re: Boosting documents by categorical preferences

Hi Chris (and others interested in this),

Sorry for dropping off.. I got sidetracked with other work and came back to
this and finally got a V1 of this implemented.

The final process is as follows:
1) Pre-compute the global categorical num_ratings/average/std-dev (so for
Action the average rating may be 3.49 with stdDev of .99)
2) For a given user, retrieve the last X (X for me is 10) ratings and
compute the user's categorical affinities by taking the average rating for
all movies in that particular category (Action) subtract the global cat
average and divide by cat std_dev. Furthermore, multiply this by the
fraction of total user ratings in that category.
   -> For example, if a user's last 10 ratings consisted of 9/10 Drama and
1/10 Thriller, the z-score of the Thriller should be discounted relative to
that of the Drama so that it's more prominent the user's preference (either
positive or negative) to Drama.
3) Sort by the absolute value of the z-score (Thanks Hossman.. great
thought).
4) Return the top 3 (arbitrary number)
5) Modify the query to look like the following:

qq=tom hanks&q={!boost b=$b defType=edismax
v=$qq}&cat1=category:Children&cat2=category:Fantasy&cat3=category:Animation&b=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241)))

basically b = 1+(pref1*query(category:something1) +
pref2*query(category:something2) + pref3*query(category:something3))

The initial results seem to be kinda promising... of course there are many
more optimizations I could do like decay user ratings over time to indicate
that preferences decay over time so a 5 rating a year ago doesn't count as
much as a 5 rating today.

Hope this helps others. I'll open source what I have soon and post back. If
there is feedback or other thoughts let me know!

Cheers
Amit


On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I thought about that but my concern/question was how. If I used the pow
> : function then I'm still boosting the bad categories by a small
> : amount..alternatively I could multiply by a negative number but does that
> : work as expected?
>
> I'm not sure i understand your concern: negative powers would give you
> values less then 1, positive powers would give you values greater then 1,
> and then you'd use those values as multiplicitive boosts -- so the values
> less then 1 would penalize the scores of existing matching docs in the
> categories the user dislikes.
>
> Oh wait ... i see, in your original email (and in my subsequent suggested
> tweak to use pow()) you were talking about sum()ing up these 3 category
> boosts (and i cut/pasted sum() in my example as well) ... yeah,
> using multiplcation there would make more sense if you wanted to do the
> "negative prefrences" as well, because then then score of any matching doc
> will be reduced if it matches on an "undesired" category -- and the
> amount it will be reduced will be determined by how strongly it
> matches on that category (ie: the base score returned by the nested
> query() func) and "how negative" the undesired prefrence value (ie:
> the pow() exponent) is
>
>
> qq=...
> q={!boost b=$b v=$qq}
>
> b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
> cat1=...action...
> cat1z=1.48
> cat2=...comedy...
> cat2z=1.33
> cat3=...kids...
> cat3z=-1.7
>
>
> -Hoss
>

Re: Boosting documents by categorical preferences

Posted by Amit Nithian <an...@gmail.com>.
Chris,

Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this
as I have a writeup pretty much ready to go.

Cheers
Amit


On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : The initial results seem to be kinda promising... of course there are
> many
> : more optimizations I could do like decay user ratings over time to
> indicate
> : that preferences decay over time so a 5 rating a year ago doesn't count
> as
> : much as a 5 rating today.
> :
> : Hope this helps others. I'll open source what I have soon and post back.
> If
> : there is feedback or other thoughts let me know!
>
> Hey Amit,
>
> Glad to hear your user based boosting experiments are paying off.  I would
> definitely love to see a more detailed writeup down the road showing off
> how it affects your final user metrics -- or perhaps even give a session
> on your technique at ApacheCon?
>
>
> http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Boosting documents by categorical preferences

Posted by Chris Hostetter <ho...@fucit.org>.
: The initial results seem to be kinda promising... of course there are many
: more optimizations I could do like decay user ratings over time to indicate
: that preferences decay over time so a 5 rating a year ago doesn't count as
: much as a 5 rating today.
: 
: Hope this helps others. I'll open source what I have soon and post back. If
: there is feedback or other thoughts let me know!

Hey Amit,

Glad to hear your user based boosting experiments are paying off.  I would 
definitely love to see a more detailed writeup down the road showing off 
how it affects your final user metrics -- or perhaps even give a session 
on your technique at ApacheCon?

http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


-Hoss
http://www.lucidworks.com/