You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by tianwild <ti...@hotmail.com> on 2012/03/30 09:02:06 UTC

How to customize A->B Similarity, not default A<->B similarity?

Hi ,all

I got a new correlations table based on the raw item-item similarity.

take this for example:
ITEM1: A
ITEM2: B
loglikelihood: 0.9

The default recommender uses A->B=0.9 and B->A=0.9 are equivalent.
But now I weighted the correlation with other method. The result is
A->B=0.9, but B->A=0.7 or B->A=0(it will not appear in the table)

My question: How can I implement this customized item-item similarity?


Best regards & thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870105.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: How to customize A->B Similarity, not default A<->B similarity?

Posted by Sean Owen <sr...@gmail.com>.

You don't want to do this. Similarity only makes sense if it's symmetric.
Instead, you probably want to weight at the point that the similarity is
used. Compute it normally, then weight depending on which item is what.

On Fri, Mar 30, 2012 at 8:02 AM, tianwild <ti...@hotmail.com> wrote:

> Hi ,all
>
> I got a new correlations table based on the raw item-item similarity.
>
> take this for example:
> ITEM1: A
> ITEM2: B
> loglikelihood: 0.9
>
> The default recommender uses A->B=0.9 and B->A=0.9 are equivalent.
> But now I weighted the correlation with other method. The result is
> A->B=0.9, but B->A=0.7 or B->A=0(it will not appear in the table)
>
> My question: How can I implement this customized item-item similarity?
>
>
> Best regards & thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870105.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: Cluster hierarchy with RowSimilarityJob

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Ah, but reading about top down I found ClusterOutputPostProcessorDriver. 
It looks like this will extract the centroid vectors. Maybe all I need 
is top down and I can calculate distances with CosineDistanceMeasure 
directly since this should never require a mapreduce implementation. The 
sub-clusters are never huge in number.

On 3/31/12 10:53 AM, Pat Ferrel wrote:
> Yes, I understand but I'm trying something different and in any case 
> need cluster to cluster distances.
>
> On 3/31/12 10:37 AM, Paritosh Ranjan wrote:
>> You can also try Top Down Clustering if this suits your use case. 
>> Find out bigger clusters first, and then, find out smaller clusters 
>> in bigger clusters and so on.
>> https://cwiki.apache.org/MAHOUT/top-down-clustering.html
>>
>> On 31-03-2012 23:00, Pat Ferrel wrote:
>>> I need to calculate similar clusters and get cluster to cluster 
>>> distances for several reasons.
>>>
>>> The most likely tool for this is the RowSimilarityJob. I imagine it 
>>> would take a list of vectors (clusterid, list of the centroid's 
>>> termid->weights) and calculate the list of vectors (clusterid, list 
>>> of clusterid->distance)
>>>
>>> The clusters file is of type Key class: class 
>>> org.apache.hadoop.io.Text (named vectors) Value Class: class 
>>> org.apache.mahout.clustering.kmeans.Cluster and does not work as 
>>> input to the RowID job. Looking at the actual values in the file I 
>>> suspect the algorithm would work but since the classname is Cluster, 
>>> RowID dies asking for org.apache.mahout.math.VectorWritable
>>>
>>> What is the easiest way to get RowID and RowSimilarity to work in 
>>> this case?
>>>
>>> If I need to mod one of these, which do you recommend? Maybe a new 
>>> job that takes the Clusters and outputs the "center" as a 
>>> IntWriteable (clusterID) VectorWritable (centroid from the Cluster 
>>> class)?
>>>
>>>
>>>
>>
>>

Re: Cluster hierarchy with RowSimilarityJob

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Yes, I understand but I'm trying something different and in any case 
need cluster to cluster distances.

On 3/31/12 10:37 AM, Paritosh Ranjan wrote:
> You can also try Top Down Clustering if this suits your use case. Find 
> out bigger clusters first, and then, find out smaller clusters in 
> bigger clusters and so on.
> https://cwiki.apache.org/MAHOUT/top-down-clustering.html
>
> On 31-03-2012 23:00, Pat Ferrel wrote:
>> I need to calculate similar clusters and get cluster to cluster 
>> distances for several reasons.
>>
>> The most likely tool for this is the RowSimilarityJob. I imagine it 
>> would take a list of vectors (clusterid, list of the centroid's 
>> termid->weights) and calculate the list of vectors (clusterid, list 
>> of clusterid->distance)
>>
>> The clusters file is of type Key class: class 
>> org.apache.hadoop.io.Text (named vectors) Value Class: class 
>> org.apache.mahout.clustering.kmeans.Cluster and does not work as 
>> input to the RowID job. Looking at the actual values in the file I 
>> suspect the algorithm would work but since the classname is Cluster, 
>> RowID dies asking for org.apache.mahout.math.VectorWritable
>>
>> What is the easiest way to get RowID and RowSimilarity to work in 
>> this case?
>>
>> If I need to mod one of these, which do you recommend? Maybe a new 
>> job that takes the Clusters and outputs the "center" as a 
>> IntWriteable (clusterID) VectorWritable (centroid from the Cluster 
>> class)?
>>
>>
>>
>
>

Re: Cluster hierarchy with RowSimilarityJob

Posted by Paritosh Ranjan <pr...@xebia.com>.

You can also try Top Down Clustering if this suits your use case. Find 
out bigger clusters first, and then, find out smaller clusters in bigger 
clusters and so on.
https://cwiki.apache.org/MAHOUT/top-down-clustering.html

On 31-03-2012 23:00, Pat Ferrel wrote:
> I need to calculate similar clusters and get cluster to cluster 
> distances for several reasons.
>
> The most likely tool for this is the RowSimilarityJob. I imagine it 
> would take a list of vectors (clusterid, list of the centroid's 
> termid->weights) and calculate the list of vectors (clusterid, list of 
> clusterid->distance)
>
> The clusters file is of type Key class: class 
> org.apache.hadoop.io.Text (named vectors) Value Class: class 
> org.apache.mahout.clustering.kmeans.Cluster and does not work as input 
> to the RowID job. Looking at the actual values in the file I suspect 
> the algorithm would work but since the classname is Cluster, RowID 
> dies asking for org.apache.mahout.math.VectorWritable
>
> What is the easiest way to get RowID and RowSimilarity to work in this 
> case?
>
> If I need to mod one of these, which do you recommend? Maybe a new job 
> that takes the Clusters and outputs the "center" as a IntWriteable 
> (clusterID) VectorWritable (centroid from the Cluster class)?
>
>
>

Cluster hierarchy with RowSimilarityJob

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I need to calculate similar clusters and get cluster to cluster 
distances for several reasons.

The most likely tool for this is the RowSimilarityJob. I imagine it 
would take a list of vectors (clusterid, list of the centroid's 
termid->weights) and calculate the list of vectors (clusterid, list of 
clusterid->distance)

The clusters file is of type Key class: class org.apache.hadoop.io.Text 
(named vectors) Value Class: class 
org.apache.mahout.clustering.kmeans.Cluster and does not work as input 
to the RowID job. Looking at the actual values in the file I suspect the 
algorithm would work but since the classname is Cluster, RowID dies 
asking for org.apache.mahout.math.VectorWritable

What is the easiest way to get RowID and RowSimilarity to work in this 
case?

If I need to mod one of these, which do you recommend? Maybe a new job 
that takes the Clusters and outputs the "center" as a IntWriteable 
(clusterID) VectorWritable (centroid from the Cluster class)?

Re: How to customize A->B Similarity, not default A<->B similarity?

Posted by tianwild <ti...@hotmail.com>.

I  got this problem:

The default: A<->B=0.9

The weigthed result: A->B=0.9  B->A=0(not appear in my MySQL table)

The recommend result: also apear the recommendation : A is recommended to B



--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870115.html
Sent from the Mahout User List mailing list archive at Nabble.com.