You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "Martin, Nick" <Ni...@pssd.com> on 2013/09/17 22:41:17 UTC

Preference to vectors for clustering

Hi all,

I'm looking for the best way to get user clusters from my recommendation output. Idea being I have my recommended items for users (user, item, score) based on their preferences but I want to see how the users were clustered together (and their similarity) so I can run some other analytics on those clusters. I found some discussion on this here (http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html) but I'm not sure if any updates have been made since this thread that would make this a bit easier? If not, is what's discussed in the thread my best approach?

Hope that makes sense...

Thanks,
Nick

Re: Preference to vectors for clustering

Posted by Pat Ferrel <pa...@gmail.com>.

A less simple but better way to cluster would be to run the vectors in the DRM through SSVD and cluster the factorized vectors. This turns sometimes very sparse vectors into dimensionally reduced dense vectors and can improve the clusters. Same applies to the item vectors. Also I've been told that streaming-kmeans works better for very sparse vectors. I'm planning to try this for clustering items shortly.

On Sep 18, 2013, at 6:15 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

The simplest way to cluster users would be to take the output of PreparePreferenceMatrixJob, which creates a DistributedRowMatrix (DRM) of all user prefs. The rows are users the columns items, the values are preference values. Cluster the rows. Transpose that matrix and clustering rows will give you item clusters--nifty.

On Sep 17, 2013, at 1:41 PM, "Martin, Nick" <Ni...@pssd.com> wrote:

Hi all,

I'm looking for the best way to get user clusters from my recommendation output. Idea being I have my recommended items for users (user, item, score) based on their preferences but I want to see how the users were clustered together (and their similarity) so I can run some other analytics on those clusters. I found some discussion on this here (http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html) but I'm not sure if any updates have been made since this thread that would make this a bit easier? If not, is what's discussed in the thread my best approach?

Hope that makes sense...

Thanks,
Nick

Re: Preference to vectors for clustering

Posted by Pat Ferrel <pa...@occamsmachete.com>.

The simplest way to cluster users would be to take the output of PreparePreferenceMatrixJob, which creates a DistributedRowMatrix (DRM) of all user prefs. The rows are users the columns items, the values are preference values. Cluster the rows. Transpose that matrix and clustering rows will give you item clusters--nifty.
 
On Sep 17, 2013, at 1:41 PM, "Martin, Nick" <Ni...@pssd.com> wrote:

Hi all,

I'm looking for the best way to get user clusters from my recommendation output. Idea being I have my recommended items for users (user, item, score) based on their preferences but I want to see how the users were clustered together (and their similarity) so I can run some other analytics on those clusters. I found some discussion on this here (http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html) but I'm not sure if any updates have been made since this thread that would make this a bit easier? If not, is what's discussed in the thread my best approach?

Hope that makes sense...

Thanks,
Nick