You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by tianwild <ti...@hotmail.com> on 2012/03/30 09:02:06 UTC
How to customize A->B Similarity, not default A<->B similarity?
Hi ,all
I got a new correlations table based on the raw item-item similarity.
take this for example:
ITEM1: A
ITEM2: B
loglikelihood: 0.9
The default recommender uses A->B=0.9 and B->A=0.9 are equivalent.
But now I weighted the correlation with other method. The result is
A->B=0.9, but B->A=0.7 or B->A=0(it will not appear in the table)
My question: How can I implement this customized item-item similarity?
Best regards & thanks
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870105.html
Sent from the Mahout User List mailing list archive at Nabble.com.
Re: How to customize A->B Similarity, not default A<->B similarity?
Posted by Sean Owen <sr...@gmail.com>.
You don't want to do this. Similarity only makes sense if it's symmetric.
Instead, you probably want to weight at the point that the similarity is
used. Compute it normally, then weight depending on which item is what.
On Fri, Mar 30, 2012 at 8:02 AM, tianwild <ti...@hotmail.com> wrote:
> Hi ,all
>
> I got a new correlations table based on the raw item-item similarity.
>
> take this for example:
> ITEM1: A
> ITEM2: B
> loglikelihood: 0.9
>
> The default recommender uses A->B=0.9 and B->A=0.9 are equivalent.
> But now I weighted the correlation with other method. The result is
> A->B=0.9, but B->A=0.7 or B->A=0(it will not appear in the table)
>
> My question: How can I implement this customized item-item similarity?
>
>
> Best regards & thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870105.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
Re: Cluster hierarchy with RowSimilarityJob
Posted by Pat Ferrel <pa...@occamsmachete.com>.
Ah, but reading about top down I found ClusterOutputPostProcessorDriver.
It looks like this will extract the centroid vectors. Maybe all I need
is top down and I can calculate distances with CosineDistanceMeasure
directly since this should never require a mapreduce implementation. The
sub-clusters are never huge in number.
On 3/31/12 10:53 AM, Pat Ferrel wrote:
> Yes, I understand but I'm trying something different and in any case
> need cluster to cluster distances.
>
> On 3/31/12 10:37 AM, Paritosh Ranjan wrote:
>> You can also try Top Down Clustering if this suits your use case.
>> Find out bigger clusters first, and then, find out smaller clusters
>> in bigger clusters and so on.
>> https://cwiki.apache.org/MAHOUT/top-down-clustering.html
>>
>> On 31-03-2012 23:00, Pat Ferrel wrote:
>>> I need to calculate similar clusters and get cluster to cluster
>>> distances for several reasons.
>>>
>>> The most likely tool for this is the RowSimilarityJob. I imagine it
>>> would take a list of vectors (clusterid, list of the centroid's
>>> termid->weights) and calculate the list of vectors (clusterid, list
>>> of clusterid->distance)
>>>
>>> The clusters file is of type Key class: class
>>> org.apache.hadoop.io.Text (named vectors) Value Class: class
>>> org.apache.mahout.clustering.kmeans.Cluster and does not work as
>>> input to the RowID job. Looking at the actual values in the file I
>>> suspect the algorithm would work but since the classname is Cluster,
>>> RowID dies asking for org.apache.mahout.math.VectorWritable
>>>
>>> What is the easiest way to get RowID and RowSimilarity to work in
>>> this case?
>>>
>>> If I need to mod one of these, which do you recommend? Maybe a new
>>> job that takes the Clusters and outputs the "center" as a
>>> IntWriteable (clusterID) VectorWritable (centroid from the Cluster
>>> class)?
>>>
>>>
>>>
>>
>>
Re: Cluster hierarchy with RowSimilarityJob
Posted by Pat Ferrel <pa...@occamsmachete.com>.
Yes, I understand but I'm trying something different and in any case
need cluster to cluster distances.
On 3/31/12 10:37 AM, Paritosh Ranjan wrote:
> You can also try Top Down Clustering if this suits your use case. Find
> out bigger clusters first, and then, find out smaller clusters in
> bigger clusters and so on.
> https://cwiki.apache.org/MAHOUT/top-down-clustering.html
>
> On 31-03-2012 23:00, Pat Ferrel wrote:
>> I need to calculate similar clusters and get cluster to cluster
>> distances for several reasons.
>>
>> The most likely tool for this is the RowSimilarityJob. I imagine it
>> would take a list of vectors (clusterid, list of the centroid's
>> termid->weights) and calculate the list of vectors (clusterid, list
>> of clusterid->distance)
>>
>> The clusters file is of type Key class: class
>> org.apache.hadoop.io.Text (named vectors) Value Class: class
>> org.apache.mahout.clustering.kmeans.Cluster and does not work as
>> input to the RowID job. Looking at the actual values in the file I
>> suspect the algorithm would work but since the classname is Cluster,
>> RowID dies asking for org.apache.mahout.math.VectorWritable
>>
>> What is the easiest way to get RowID and RowSimilarity to work in
>> this case?
>>
>> If I need to mod one of these, which do you recommend? Maybe a new
>> job that takes the Clusters and outputs the "center" as a
>> IntWriteable (clusterID) VectorWritable (centroid from the Cluster
>> class)?
>>
>>
>>
>
>
Re: Cluster hierarchy with RowSimilarityJob
Posted by Paritosh Ranjan <pr...@xebia.com>.
You can also try Top Down Clustering if this suits your use case. Find
out bigger clusters first, and then, find out smaller clusters in bigger
clusters and so on.
https://cwiki.apache.org/MAHOUT/top-down-clustering.html
On 31-03-2012 23:00, Pat Ferrel wrote:
> I need to calculate similar clusters and get cluster to cluster
> distances for several reasons.
>
> The most likely tool for this is the RowSimilarityJob. I imagine it
> would take a list of vectors (clusterid, list of the centroid's
> termid->weights) and calculate the list of vectors (clusterid, list of
> clusterid->distance)
>
> The clusters file is of type Key class: class
> org.apache.hadoop.io.Text (named vectors) Value Class: class
> org.apache.mahout.clustering.kmeans.Cluster and does not work as input
> to the RowID job. Looking at the actual values in the file I suspect
> the algorithm would work but since the classname is Cluster, RowID
> dies asking for org.apache.mahout.math.VectorWritable
>
> What is the easiest way to get RowID and RowSimilarity to work in this
> case?
>
> If I need to mod one of these, which do you recommend? Maybe a new job
> that takes the Clusters and outputs the "center" as a IntWriteable
> (clusterID) VectorWritable (centroid from the Cluster class)?
>
>
>
Cluster hierarchy with RowSimilarityJob
Posted by Pat Ferrel <pa...@occamsmachete.com>.
I need to calculate similar clusters and get cluster to cluster
distances for several reasons.
The most likely tool for this is the RowSimilarityJob. I imagine it
would take a list of vectors (clusterid, list of the centroid's
termid->weights) and calculate the list of vectors (clusterid, list of
clusterid->distance)
The clusters file is of type Key class: class org.apache.hadoop.io.Text
(named vectors) Value Class: class
org.apache.mahout.clustering.kmeans.Cluster and does not work as input
to the RowID job. Looking at the actual values in the file I suspect the
algorithm would work but since the classname is Cluster, RowID dies
asking for org.apache.mahout.math.VectorWritable
What is the easiest way to get RowID and RowSimilarity to work in this
case?
If I need to mod one of these, which do you recommend? Maybe a new job
that takes the Clusters and outputs the "center" as a IntWriteable
(clusterID) VectorWritable (centroid from the Cluster class)?
Re: How to customize A->B Similarity, not default A<->B similarity?
Posted by tianwild <ti...@hotmail.com>.
I got this problem:
The default: A<->B=0.9
The weigthed result: A->B=0.9 B->A=0(not appear in my MySQL table)
The recommend result: also apear the recommendation : A is recommended to B
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-customize-A-B-Similarity-not-default-A-B-similarity-tp3870105p3870115.html
Sent from the Mahout User List mailing list archive at Nabble.com.