You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by mail2abin <ma...@gmail.com> on 2011/05/09 21:38:13 UTC

Clustering boolean vectors

Hi,


I was trying to run ItemBasedRecommender on GroupLens movie sample data,
which requires the rating ( user preferences inp). But suppose I do not have
the rating ( user prefereces) , rather I have an
Item boolean attribute vector. [ like God father - 0|1|0|0|0|0|1 ] , where
the two 1's may say Crime, Drama.

ItemBasedRecommender requires a DataModel, which I do not have. Instead I
think I should use some Clustering techniques based on the Item boolean
attribute vector, as I understand and later get items which belongs to the
cluster.

Please give pointers to the right Clustering API ( though I have see
TanimotoCluster etc.), not sure if they are good for boolean vectors.

Abin
Software Developer
NY

--
View this message in context: http://lucene.472066.n3.nabble.com/Clustering-boolean-vectors-tp2920165p2920165.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Clustering boolean vectors

Posted by Sean Owen <sr...@gmail.com>.
GroupLens doesn't *require* a rating per se -- you are free to ignore
it if you want!

Boolean data is all 1, in Mahout. There are no 0 ratings. If you just
mean that the non-existent preferences are "0", OK. But having two
ratings, 0 and 1, along with the possibility of not existing, is three
states, not two.

You can easily have a DataModel, if you have the GroupLens data.
Convert it to CSV, or just use the GroupLensDataModel in examples/.

But, to really answer your question: first you should define what you
are trying to do. Then we can help decide how to do it. I don't know
if you need clustering or not so far.

Sean

On Mon, May 9, 2011 at 8:38 PM, mail2abin <ma...@gmail.com> wrote:
> Hi,
>
>
> I was trying to run ItemBasedRecommender on GroupLens movie sample data,
> which requires the rating ( user preferences inp). But suppose I do not have
> the rating ( user prefereces) , rather I have an
> Item boolean attribute vector. [ like God father - 0|1|0|0|0|0|1 ] , where
> the two 1's may say Crime, Drama.
>
> ItemBasedRecommender requires a DataModel, which I do not have. Instead I
> think I should use some Clustering techniques based on the Item boolean
> attribute vector, as I understand and later get items which belongs to the
> cluster.
>
> Please give pointers to the right Clustering API ( though I have see
> TanimotoCluster etc.), not sure if they are good for boolean vectors.
>
> Abin
> Software Developer
> NY
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Clustering-boolean-vectors-tp2920165p2920165.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>