You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/08/13 21:18:17 UTC
[jira] Resolved: (MAHOUT-478) Do we need normalize
SimilarityMatrixEntryKey?
[ https://issues.apache.org/jira/browse/MAHOUT-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-478.
------------------------------
Fix Version/s: (was: 0.4)
Resolution: Not A Problem
Am I right we think this is "not a problem" then?
> Do we need normalize SimilarityMatrixEntryKey?
> -----------------------------------------------
>
> Key: MAHOUT-478
> URL: https://issues.apache.org/jira/browse/MAHOUT-478
> Project: Mahout
> Issue Type: Question
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
>
> In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
> {code}
> public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {
> protected SimilarityMatrixEntryKeyComparator() {
> super(SimilarityMatrixEntryKey.class, true);
> }
> @Override
> public int compare(WritableComparable a, WritableComparable b) {
> SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
> SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;
> int result = compare(key1.row, key2.row);
> if (result == 0) {
> result = -1 * compare(key1.value, key2.value);
> }
> return result;
> }
> protected static int compare(long a, long b) {
> return (a == b) ? 0 : (a < b) ? -1 : 1;
> }
> protected static int compare(double a, double b) {
> return (a == b) ? 0 : (a < b) ? -1 : 1;
> }
> }
> {code}
> We used double as one part of the key,
> because of double has many possible value ,it will cause pairwiseSimilarity may has may group,
> the count of group also is out of our control.
> for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA ,0.0011) is different group.
> Also double is inaccurate,it hard to compare the equal of double .
> So can we normalize the similarityValue ?
> multiply all similarityValue with 100,1000 ,or other numer,and make it to a integer.
> If necessary we can transform them to double in the end.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.