You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Matthew Bryan <go...@gmail.com> on 2010/02/06 17:07:55 UTC

Turning Preference Files Into Vectors

Is there a straightforward way to take a preference file that's used
for a recommender (user_id, item_id, preference) and turn it into a
vector that can be used for clustering? As part of my evaluation of
Mahout I'd also like to cluster items and see how those simple
clusters perform.

Thanks!

Matthew Bryan

Re: Turning Preference Files Into Vectors

Posted by Ted Dunning <te...@gmail.com>.
Fantastic.  That will be great.

On Sat, Feb 6, 2010 at 12:05 PM, Matthew Bryan <go...@gmail.com> wrote:

> hopefully I'll soon
> get to the point that I'm comfortable contributing code as you
> suggest.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Turning Preference Files Into Vectors

Posted by Matthew Bryan <go...@gmail.com>.
Worked perfectly!

Thanks, and thanks for the active user community...hopefully I'll soon
get to the point that I'm comfortable contributing code as you
suggest.

Matt

On Sat, Feb 6, 2010 at 12:21 PM, Sean <sr...@gmail.com> wrote:
> I can point you to 90% of what you need in the existing code. Look at
> package org.apache.mahout.cf.taste.hadoop.item first.
>
> RecommenderJob runs several MRs to make recommendations, and along the
> way does what you want -- almost. It outputs user vectors -- for each
> user, a vector with item IDs as indices and pref values as
> coordinates. You want the transpose of that -- for each item, a vector
> with user IDs as indices, etc.
>
> We can't use IDs in the recommender as indices directly, since IDs are
> longs, and vector dimensions are ints of course. So there's the first
> stage where we create a mapping from the real IDs to hashed indices.
> This is what ItemIDIndexMapper/Reducer do. You would just copy and
> tweak them to deal with user IDs.
>
> Then ToItemPrefsMapper/ToUserVectorReducer team up to write out the
> vectors. Same thing -- just an exercise in swapping user IDs and item
> IDs.
>
> The rest of the MRs don't matter to you. You could even copy
> RecommenderJob and cut out the other bits it runs, and have a
> ready-made driver.
>
> It's easier than it maybe sounds -- these are all quite small classes.
>
>
> If it works and you care to think through and contribute a clean
> refactoring that allows for generating item vectors as well as user
> vectors I'd commit that. But feel free to just hack for your own
> purpose too.
>
>
> Sean
>
>
>
> On Sat, Feb 6, 2010 at 4:07 PM, Matthew Bryan <go...@gmail.com> wrote:
>> Is there a straightforward way to take a preference file that's used
>> for a recommender (user_id, item_id, preference) and turn it into a
>> vector that can be used for clustering? As part of my evaluation of
>> Mahout I'd also like to cluster items and see how those simple
>> clusters perform.
>>
>> Thanks!
>>
>> Matthew Bryan
>>
>

Re: Turning Preference Files Into Vectors

Posted by Sean <sr...@gmail.com>.
I can point you to 90% of what you need in the existing code. Look at
package org.apache.mahout.cf.taste.hadoop.item first.

RecommenderJob runs several MRs to make recommendations, and along the
way does what you want -- almost. It outputs user vectors -- for each
user, a vector with item IDs as indices and pref values as
coordinates. You want the transpose of that -- for each item, a vector
with user IDs as indices, etc.

We can't use IDs in the recommender as indices directly, since IDs are
longs, and vector dimensions are ints of course. So there's the first
stage where we create a mapping from the real IDs to hashed indices.
This is what ItemIDIndexMapper/Reducer do. You would just copy and
tweak them to deal with user IDs.

Then ToItemPrefsMapper/ToUserVectorReducer team up to write out the
vectors. Same thing -- just an exercise in swapping user IDs and item
IDs.

The rest of the MRs don't matter to you. You could even copy
RecommenderJob and cut out the other bits it runs, and have a
ready-made driver.

It's easier than it maybe sounds -- these are all quite small classes.


If it works and you care to think through and contribute a clean
refactoring that allows for generating item vectors as well as user
vectors I'd commit that. But feel free to just hack for your own
purpose too.


Sean



On Sat, Feb 6, 2010 at 4:07 PM, Matthew Bryan <go...@gmail.com> wrote:
> Is there a straightforward way to take a preference file that's used
> for a recommender (user_id, item_id, preference) and turn it into a
> vector that can be used for clustering? As part of my evaluation of
> Mahout I'd also like to cluster items and see how those simple
> clusters perform.
>
> Thanks!
>
> Matthew Bryan
>

Re: Turning Preference Files Into Vectors

Posted by Ted Dunning <te...@gmail.com>.
It should be pretty easy to use any of the vectorizing code that is coming
out just now to build vectors out of this data.  Minor reformatting is
likely necessary that is probably best done by augmenting the input parsing
for the vectorizer.

Take a look at the active JIRA's for more info.   See, for instance,
https://issues.apache.org/jira/browse/MAHOUT-237

On Sat, Feb 6, 2010 at 8:07 AM, Matthew Bryan <go...@gmail.com> wrote:

> Is there a straightforward way to take a preference file that's used
> for a recommender (user_id, item_id, preference) and turn it into a
> vector that can be used for clustering? As part of my evaluation of
> Mahout I'd also like to cluster items and see how those simple
> clusters perform.
>
> Thanks!
>
> Matthew Bryan
>



-- 
Ted Dunning, CTO
DeepDyve